Open vpaba opened 3 years ago
Here is a first list from OceanOPS: https://www.ocean-ops.org/api/preview/?param=endingcause
this list is a bit too long and needs to be simplified.
A first exchange with D. Roemmich identifies these 3 elements that determine the true survival rate of Argo floats:
i) buoyancy engine and other hardware. ii) battery lifetime. iii) CTD sensor stability.
UNKNOWN
HARDWARE BATTERY SOFTWARE TELEMETRY CTD
GROUNDED BEACHED PICKED_UP, DAMAGED, VANDALIZED DEPLOYMENT_FAILURE
RECOVERED MANUAL_MISSION_END
I have received very valuable feedback after some email exchanges within Euro-Argo. I have tried to compile them and establish a draft list of possible ending causes, organised per categories (thanks @matdon17). There are still some comments to tackle and definitions/concepts that would need to be given more clarity I reckon.
We think the ending causes could be compound.
e.g. a float with a hardware problem could have been intentionally recovered, a float with hydraulic issues could have exhausted its battery abnormally, etc.
Perhaps we need to get a clear picture of the range of failure modes from across international Argo, to check whether they all fit into this list or if we need to add new entries or modify existing ones?
I attach here the current proposed list Floats ending cause AVTT 20210526.docx
@mbelbeoch thanks for the OceanOPS list. Would it be possible to extract from your database the available information on your existing "ending cause" metadata? It would be interesting to see the level of recorded information and the most populated fields.
@RomainCancouet : here's what we have in the DB. | Count | Ending cause |
---|---|---|
33 | Automatically closed by time rules | |
3474 | Automatically closed due to a lack of recent location | |
2 | Automatically closed due to conflicting dates | |
1 | Battery failure | |
73 | Beached shore | |
2 | Caught by fishermen | |
11 | Deployment Failure | |
13 | Failed | |
1 | Grounded | |
1 | Hardware problem | |
1 | Human interference/error | |
3 | Lost | |
5 | Manually closed | |
12 | Picked up | |
17 | Recovered | |
14 | Software problem | |
2 | Stuck on the surface | |
9 | Unknown | |
4974 | NULL |
When the reference table is defined we need to review from scratch the values for each float. It will be challenging. So we will need inputs by each float group hoping they maintain internally such information. Existing OceanOPS DB is a modest starting base, except for beached floats for which our monitoring is pretty good.
@anthoninlize is that only for floats which have their status set to 'closed'? There are 8648 floats with a closed status, but there are 4252 floats listed as inactive, many of which probably should be 'closed'. Does this means most 'closed' floats have no ending cause?
@matdon17 I updated my message to enforce the filter on closed floats, and including the ones without an ending cause set. So yes, around half of them have no ending cause set.
@RomainCancouet
your list complete well the initial one. However I wonder if we need to make two level of granularity for this. if I sum up, and add a few more I did think of:
UNKNOWN (if checked) UNCHECKED (default)
HARDWARE_BUOYANCY HARDWARE_WATER_INTAKE HARDWARE_ELECTRICAL HARDWARE_INTERNAL_VACCUM HARDWARE_BALLAST HARDWARE_OTHER
BATTERY_ENDOF BATTERY_UNEXPECTED
SOFTWARE_CONTROLLER SOFTWARE_OTHER
TELEMETRY LOCATION
SENSOR_CTD SENSOR_OTHER
GROUNDED BEACHED ICED PICKED_UP, DAMAGED, VANDALIZED
RECOVERED MANUAL_MISSION_END (by telemetry config file upload)
a couple elements from your list miss here as I think they are not specific enough: early_failure confidential (why this ?? a customer can accept this reason ?) stuck at surface (it is a ballast or buoyancy issue then)
I think one of the critical things to ensure is kept separate is whether a failure is caused by the float or sensor failure. So for "water intake", for instance, whether a sensor failed or the float failed should be kept separate as this is an important distinction.
In terms of definitions, I think these need to be written in terms of an issue leading to failure. So for instance, "Communication issues" should be defined in a way that reflects the loss of the float, whereas the current tentative description seems application to any float that has ever developed any comms problems, but may still be transmitting. Similarly with 'ICED' this should be restricted to a float that is thought to have been lost due to ice damage or being stuck under-ice (e.g. drifting under multi-year ice in the deep Weddell Sea). Non-failure problem can still be captured in e.g. the ANOMALY field in the NetCDF.
One specific point on batteries - "Battery normally exhausted" is clearly an "Expected deterioration of the float", but is "Battery exhausted unexpectedly" actually an "Unexpected float technical issue"?
When the reference table is defined we need to review from scratch the values for each float. It will be challenging. So we will need inputs by each float group hoping they maintain internally such information.
@mbelbeoch I do not think this field should be mandatory populated in OceanOPS and/or Argo files. I think it is important to have a standardised and well-defined vocab so that Argo group that volunteer or are willing to keep track of this information are able to record it appropriately. For groups that already have some information they may provide it, and that may encourage people to continue with more recent floats.
Existing OceanOPS DB is a modest starting base, except for beached floats for which our monitoring is pretty good.
Yes, thanks, OceanOPS DB is valuable. I reckon you identified these beached floats with the float last locations and a buffer to the coastline? Do you keep into account harbor as well (some floats last locations are there because they have been picked up by a ship: fishermen, etc.)
To my knowledge you do have a metadata "iced" as well, that may be used to tag the dead floats as "iced" e.g. if their last cycles are "iced"?
I suspect a lot of floats will have an ending cause = battery exhausted, and that we might populate the DB after some automatic analyses. For example we could use the timeseries of the battery voltage. Or use the number of performed cycles versus the number of expected cycles for a float. Float experts and manufacturers would be of great help for such exercice.
a couple elements from your list miss here as I think they are not specific enough: early_failure
@mbelbeoch I agree: early_failure is a duplicate from all other entries than BATTERY_ENDOF
confidential (why this ?? a customer can accept this reason ?)
I think we can keep this field to express the fact that the ending cause is known but the Argo groups or manufacturer to do want to document it publicly.
stuck at surface (it is a ballast or buoyancy issue then)
In my view this documents the fact that the float ended its life drifting at the surface. It could indeed be due to a ballast or buoyancy issue, but also to a sensor problem (e.g. defect of pressure sensor which prevent float navigation in the water column) or other software/hardware issues. => compound causes
@anthoninlize thanks for the DB extraction. I think there are more than 17 floats recovered. @lucarduini and I have compiled a file with possibly recovered floats, based on duplicates in float serial number, CTD serial number, etc. If these entries are confirmed by the Argo groups we may update the OceanOPS DB.
@RomainCancouet it's not necessarily correlated (in our DB). A float can be declared dead before/independently, thus not reflecting the recovery. If you want recovery stats (?), that's another set of figures.
With @mbelbeoch we have reviewed a little the draft list, with two levels of granularity (attachment). We will send an email to Argo lists for their feedback. ENDING_CAUSE_0.1.docx
a comment from @tcarval : as the ENDING_CAUSE list already exists from OceanOPS (https://www.ocean-ops.org/api/preview/help/?param=endingcause), we might use lowercase for this new list (i.e. unchecked instead of UNCHECKED previously)
This is the list that CSIRO have started using for our fleet:
0 | Battery exhausted/failed 1 | Leak 2 | Air bladder failure 3 | Grounding 4 | Came on shore 5 | Ice-related failure 6 | DoD 7 | Anode corrosion (leak) 8 | Other: ice, DML, Comms, Software, Stuck on surface etc 9 | Unknown
We have started implementing this in our database and will assign a primary cause and possibly multiple secondary. This list does not separate battery exhaustion into expected or early failure because you need a robust criterion to separate this. I was hoping that the fleet analysis would show a statistically robust bi-modality in the cycle number of battery exhaustion, but not so far. It is important to understand early battery failures. We also note the cycle number of failure, to use in further analysis. Stuck on the surface is not in your circulated list.
thank you very much for the feedback @jlovellcsiro
I could see that the identified CSIRO ending causes would fit quite well in the proposed list.
Stuck on the surface was in the first draft (post on 27 May) and was in the OceanOPS list from @mbelbeoch; so I think we could add it back. Our point was that it describes more the ending status of a float (e.g. not able to dive anymore due to pressure sensor problem, loss of ballast, etc.) rather than the reason why the float failed.
On battery exhaustion: for most of European floats we have defined a criterion (based on battery voltage time series) to identify battery exhaustion. This seems to work pretty much. If the reached cycle number is close to the theoretical lifetime of a float (hence the importance of such figures or energy budgets to be provided by manufacturers), we usually classify the float in "Battery normally exhausted".
what is the meaning of DoD and DML?
Thanks Romain. Do you have documentation of the theoretical lifetimes? DoD is died on deployment - reported no cycles. DML is Druck microleak - this is more of a data quality issue than a cause of death
thanks @jlovellcsiro I have some general figures for some models, from datasheets, manuals or manufacturers information. I definitely think the community would benefit to have these theoretical lifetimes publicly available.
I've created a GoogleDoc to maintain and document the current list status of suggested ending causes. The Gdoc is opened in suggestion mode for collaborative work. I have integrated the comments received so far. We will integrate new entries if we receive further feedback from Argo teams.
The FloatEndingCause terms will appear in the NetCDF metadata file “ANOMALY” variable. However, "ANOMALY" is not llimited to float ending cause. We may want to add a specific “FloatEndingCause” variable in the NetCDF metadata file.
just a comment : couldn't we use the END_MISSION_STATUS parameter ? and update it with more options : END_MISSION_STATUS ; END_MISSION_STATUS:long_name = "Status of the end of mission of the float" ; END_MISSION_STATUS:conventions = "T:No more transmission received, R:Retrieved" ; END_MISSION_STATUS:_FillValue = " " ;
The FloatEndingCause terms will appear in the NetCDF metadata file “ANOMALY” variable. However, "ANOMALY" is not llimited to float ending cause. We may want to add a specific “FloatEndingCause” variable in the NetCDF metadata file.
I move back here the comments from the GoogleDoc which were related to the data management and storage issues, not the content of the vocab list:
29.11.2021 Claudia:
An example would be nice. Will it look like this: ANOMALY = "ENDING_CAUSE_CATEGORY=text1; ENDING_CAUSE=text2; whatever other text that could belong here as well"? My question basically doubles as a suggestion. I do think we need some kind of standard, and the first thing that popped in my head was what I wrote as part of the question.
04.12.2021 Thierry:
We may use a syntax close to JSON (https://fr.wikipedia.org/wiki/JavaScript_Object_Notation) or YAML key1:value,key2:value,... ANOMALY = "ending_cause_category:hardware,ending_cause:ballast" ANOMALY = "ending_cause_category:null,ending_cause:unchecked"
06.12.2021 Claudia:
that would work.
29.09.2021 Thierry:
Will ending_cause_category appear in the NetCDF file ? Probably not, only the ending_cause will be reported in the "ANOMALY" variable.
Hi @RomainCancouet , I added a new reference table 31 in the argo-dm-user-manual-WorkInProgressVersion You can comment/improve it.
Hello,
I am coming back regarding the ending cause vocabulary. We have progressed and established a list (20230919_AVTT_EndingCauses_Euro-Argo_2021-2022.xlsx) of possible ending causes for Euro-Argo floats that became inactive in 2021 or 2022. For each float, the ending cause(s) from the vocab is listed in column N (separated by commas if applicable), and I added in column O an ending_cause_comment as free text, that could help or provide more details regarding the possible reason of failure.
My suggestion to move forward is first to update the OceanOPS ending cause metadata for these floats. I see multiple advantages recording the ending_cause in OceanOPS:
Metadata can be updated easily (without having to regenerate the meta.nc file by the DACs) by the different Argo groups/programs. I had to update my initial list with for instance floats inactive in 2021 that became active in 2023 escaping from under the ice. Metadata could also be updated after further investigations by/with float/sensor manufacturers.
KPIs or graphs (e.g., survival rates) could be built in OceanOPS website using such metadata: number of floats recovered, that have exhausted their batteries, etc.
If some groups/DACs would then like to report the metadata in the netCDF file, I think @vincentbernardclayton suggestion
just a comment : couldn't we use the END_MISSION_STATUS parameter ? and update it with more options : END_MISSION_STATUS ; END_MISSION_STATUS:long_name = "Status of the end of mission of the float" ; END_MISSION_STATUS:conventions = "T:No more transmission received, R:Retrieved" ; END_MISSION_STATUS:_FillValue = " " ;
to use/update the existing parameter END_MISSION_STATUS of the meta file is a good idea.
I am happy to receive your feedback on the list and above suggestions.
Capturing ending_causes (e.g. list for Euro-Argo floats that became inactive in 2021 and 2022) helps us share and document floats (or sensors) failures across teams. In most cases (here 80%) it was possible to determine a possible ending_cause using the defined vocabulary entries. It also allows to obtain statistics. E.g.:
Or pursue further investigations should similar floats failures happen for different teams.
In my opinion, we should keep "END_MISSION_STATUS" as it is. Status of the end of the float’s deployment.
If we want to have the float ending cause of o float in the NetCDF metdata file, let's add a new variable "FLOAT_ENDING_CAUSE" as string256
I just wanted to add there is a possible collection in existence that could perhaps be used that is - SensorML History Event Types http://vocab.nerc.ac.uk/collection/W03/current/ Requests can be logged here - https://github.com/nvs-vocabs/W03 Thanks,
Thanks @emmerbodc. Would the Argo community consider using the W03 collection linked above @tcarval, by adapting and uploading to it the ENDING_CAUSE terms created by @RomainCancouet?
Could this collection be something shared with other observing systems @mbelbeoch, @MKrieger29, @anthoninlize ?
OceanOPS (@vturpin @MKrieger29) would like to suggest adding a new ending_cause "transfer_ownership" in the table to deal with a recent case. I have no objection and have therefore added this entry in the word document
To give a bit of context : "Transfer_ownership" happen when a float is "given" to a new program during its life time at sea.
We recently have this case when Argo Brazil could not fund more than the first 6 month of communication for a floats (outside of the durtion of the project that funded and deploy the float). In that case, Argo UK offer to pay for the com' if the float became a UK float. Consequently we had to set up a procedure to identify this change in the metadata management. This lead to the creation of the ending cause "transfer_ownership".
To give a bit of context : "Transfer_ownership" happen when a float is "given" to a new program during its life time at sea.
We recently have this case when Argo Brazil could not fund more than the first 6 month of communication for a floats (outside of the durtion of the project that funded and deploy the float). In that case, Argo UK offer to pay for the com' if the float became a UK float. Consequently we had to set up a procedure to identify this change in the metadata management. This lead to the creation of the ending cause "transfer_ownership".
In my opinion, this is not an ending cause, the floats are still alive with all data processed by BODC dac.
@RomainCancouet , @vturpin Do we create the new "ending cause" vocabulary. Otherwise let's close this ticket.
Considering now the float data is fully transfer to DAC, then I agree "transfer_ownership" is not an ending cause. However, in OceanOPS we need to conserve this term to traduce the situation and not loosing the contribution of Argo Brazil.
My suggestion is to keep this ending_cause in the AIC system but not adding it to the ending_cause reference table. This is a situation we can manage internally.
There currently isn't a reference table / vocabulary to capture all the possible reasons why a float may have died. It would be useful to include this information in the Argo NetCDF files, so a step in this direction has been proposed at the 3rd AVTT meeting in April 2021. Such work would go alongside a review of the ANOMALY field.