OneArgo / ArgoVocabs

A repository for the management of issues related to vocabularies managed by the Argo Data Management Team
7 stars 0 forks source link

Floats ending cause vocabulary #26

Open vpaba opened 3 years ago

vpaba commented 3 years ago

There currently isn't a reference table / vocabulary to capture all the possible reasons why a float may have died. It would be useful to include this information in the Argo NetCDF files, so a step in this direction has been proposed at the 3rd AVTT meeting in April 2021. Such work would go alongside a review of the ANOMALY field.

mbelbeoch commented 3 years ago

Here is a first list from OceanOPS: https://www.ocean-ops.org/api/preview/?param=endingcause

this list is a bit too long and needs to be simplified.

A first exchange with D. Roemmich identifies these 3 elements that determine the true survival rate of Argo floats:

i) buoyancy engine and other hardware. ii) battery lifetime. iii) CTD sensor stability.


So I would start with this and we would need some descriptions. ENDING_CAUSE:

UNKNOWN

HARDWARE BATTERY SOFTWARE TELEMETRY CTD

GROUNDED BEACHED PICKED_UP, DAMAGED, VANDALIZED DEPLOYMENT_FAILURE

RECOVERED MANUAL_MISSION_END

RomainCancouet commented 3 years ago

I have received very valuable feedback after some email exchanges within Euro-Argo. I have tried to compile them and establish a draft list of possible ending causes, organised per categories (thanks @matdon17). There are still some comments to tackle and definitions/concepts that would need to be given more clarity I reckon.

We think the ending causes could be compound.

e.g. a float with a hardware problem could have been intentionally recovered, a float with hydraulic issues could have exhausted its battery abnormally, etc.

Perhaps we need to get a clear picture of the range of failure modes from across international Argo, to check whether they all fit into this list or if we need to add new entries or modify existing ones?

I attach here the current proposed list Floats ending cause AVTT 20210526.docx

RomainCancouet commented 3 years ago

@mbelbeoch thanks for the OceanOPS list. Would it be possible to extract from your database the available information on your existing "ending cause" metadata? It would be interesting to see the level of recorded information and the most populated fields.

anthoninlize commented 3 years ago
@RomainCancouet : here's what we have in the DB. Count Ending cause
33 Automatically closed by time rules
3474 Automatically closed due to a lack of recent location
2 Automatically closed due to conflicting dates
1 Battery failure
73 Beached shore
2 Caught by fishermen
11 Deployment Failure
13 Failed
1 Grounded
1 Hardware problem
1 Human interference/error
3 Lost
5 Manually closed
12 Picked up
17 Recovered
14 Software problem
2 Stuck on the surface
9 Unknown
4974 NULL
mbelbeoch commented 3 years ago

When the reference table is defined we need to review from scratch the values for each float. It will be challenging. So we will need inputs by each float group hoping they maintain internally such information. Existing OceanOPS DB is a modest starting base, except for beached floats for which our monitoring is pretty good.

matdon17 commented 3 years ago

@anthoninlize is that only for floats which have their status set to 'closed'? There are 8648 floats with a closed status, but there are 4252 floats listed as inactive, many of which probably should be 'closed'. Does this means most 'closed' floats have no ending cause?

anthoninlize commented 3 years ago

@matdon17 I updated my message to enforce the filter on closed floats, and including the ones without an ending cause set. So yes, around half of them have no ending cause set.

mbelbeoch commented 3 years ago

@RomainCancouet

your list complete well the initial one. However I wonder if we need to make two level of granularity for this. if I sum up, and add a few more I did think of:

UNKNOWN (if checked) UNCHECKED (default)

HARDWARE_BUOYANCY HARDWARE_WATER_INTAKE HARDWARE_ELECTRICAL HARDWARE_INTERNAL_VACCUM HARDWARE_BALLAST HARDWARE_OTHER

BATTERY_ENDOF BATTERY_UNEXPECTED

SOFTWARE_CONTROLLER SOFTWARE_OTHER

TELEMETRY LOCATION

SENSOR_CTD SENSOR_OTHER

GROUNDED BEACHED ICED PICKED_UP, DAMAGED, VANDALIZED

RECOVERED MANUAL_MISSION_END (by telemetry config file upload)

a couple elements from your list miss here as I think they are not specific enough: early_failure confidential (why this ?? a customer can accept this reason ?) stuck at surface (it is a ballast or buoyancy issue then)

matdon17 commented 3 years ago

I think one of the critical things to ensure is kept separate is whether a failure is caused by the float or sensor failure. So for "water intake", for instance, whether a sensor failed or the float failed should be kept separate as this is an important distinction.

In terms of definitions, I think these need to be written in terms of an issue leading to failure. So for instance, "Communication issues" should be defined in a way that reflects the loss of the float, whereas the current tentative description seems application to any float that has ever developed any comms problems, but may still be transmitting. Similarly with 'ICED' this should be restricted to a float that is thought to have been lost due to ice damage or being stuck under-ice (e.g. drifting under multi-year ice in the deep Weddell Sea). Non-failure problem can still be captured in e.g. the ANOMALY field in the NetCDF.

One specific point on batteries - "Battery normally exhausted" is clearly an "Expected deterioration of the float", but is "Battery exhausted unexpectedly" actually an "Unexpected float technical issue"?

RomainCancouet commented 3 years ago

When the reference table is defined we need to review from scratch the values for each float. It will be challenging. So we will need inputs by each float group hoping they maintain internally such information.

@mbelbeoch I do not think this field should be mandatory populated in OceanOPS and/or Argo files. I think it is important to have a standardised and well-defined vocab so that Argo group that volunteer or are willing to keep track of this information are able to record it appropriately. For groups that already have some information they may provide it, and that may encourage people to continue with more recent floats.

Existing OceanOPS DB is a modest starting base, except for beached floats for which our monitoring is pretty good.

Yes, thanks, OceanOPS DB is valuable. I reckon you identified these beached floats with the float last locations and a buffer to the coastline? Do you keep into account harbor as well (some floats last locations are there because they have been picked up by a ship: fishermen, etc.)

To my knowledge you do have a metadata "iced" as well, that may be used to tag the dead floats as "iced" e.g. if their last cycles are "iced"?

I suspect a lot of floats will have an ending cause = battery exhausted, and that we might populate the DB after some automatic analyses. For example we could use the timeseries of the battery voltage. Or use the number of performed cycles versus the number of expected cycles for a float. Float experts and manufacturers would be of great help for such exercice.

RomainCancouet commented 3 years ago

a couple elements from your list miss here as I think they are not specific enough: early_failure

@mbelbeoch I agree: early_failure is a duplicate from all other entries than BATTERY_ENDOF

confidential (why this ?? a customer can accept this reason ?)

I think we can keep this field to express the fact that the ending cause is known but the Argo groups or manufacturer to do want to document it publicly.

stuck at surface (it is a ballast or buoyancy issue then)

In my view this documents the fact that the float ended its life drifting at the surface. It could indeed be due to a ballast or buoyancy issue, but also to a sensor problem (e.g. defect of pressure sensor which prevent float navigation in the water column) or other software/hardware issues. => compound causes

RomainCancouet commented 3 years ago

@anthoninlize thanks for the DB extraction. I think there are more than 17 floats recovered. @lucarduini and I have compiled a file with possibly recovered floats, based on duplicates in float serial number, CTD serial number, etc. If these entries are confirmed by the Argo groups we may update the OceanOPS DB.

anthoninlize commented 3 years ago

@RomainCancouet it's not necessarily correlated (in our DB). A float can be declared dead before/independently, thus not reflecting the recovery. If you want recovery stats (?), that's another set of figures.

RomainCancouet commented 3 years ago

With @mbelbeoch we have reviewed a little the draft list, with two levels of granularity (attachment). We will send an email to Argo lists for their feedback. ENDING_CAUSE_0.1.docx

RomainCancouet commented 3 years ago

a comment from @tcarval : as the ENDING_CAUSE list already exists from OceanOPS (https://www.ocean-ops.org/api/preview/help/?param=endingcause), we might use lowercase for this new list (i.e. unchecked instead of UNCHECKED previously)

jlovellcsiro commented 3 years ago

This is the list that CSIRO have started using for our fleet:

0 | Battery exhausted/failed 1 | Leak 2 | Air bladder failure 3 | Grounding 4 | Came on shore 5 | Ice-related failure 6 | DoD 7 | Anode corrosion (leak) 8 | Other: ice, DML, Comms, Software, Stuck on surface etc 9 | Unknown

We have started implementing this in our database and will assign a primary cause and possibly multiple secondary. This list does not separate battery exhaustion into expected or early failure because you need a robust criterion to separate this. I was hoping that the fleet analysis would show a statistically robust bi-modality in the cycle number of battery exhaustion, but not so far. It is important to understand early battery failures. We also note the cycle number of failure, to use in further analysis. Stuck on the surface is not in your circulated list.

RomainCancouet commented 3 years ago

thank you very much for the feedback @jlovellcsiro

I could see that the identified CSIRO ending causes would fit quite well in the proposed list.

Stuck on the surface was in the first draft (post on 27 May) and was in the OceanOPS list from @mbelbeoch; so I think we could add it back. Our point was that it describes more the ending status of a float (e.g. not able to dive anymore due to pressure sensor problem, loss of ballast, etc.) rather than the reason why the float failed.

On battery exhaustion: for most of European floats we have defined a criterion (based on battery voltage time series) to identify battery exhaustion. This seems to work pretty much. If the reached cycle number is close to the theoretical lifetime of a float (hence the importance of such figures or energy budgets to be provided by manufacturers), we usually classify the float in "Battery normally exhausted".

what is the meaning of DoD and DML?

jlovellcsiro commented 3 years ago

Thanks Romain. Do you have documentation of the theoretical lifetimes? DoD is died on deployment - reported no cycles. DML is Druck microleak - this is more of a data quality issue than a cause of death

RomainCancouet commented 3 years ago

thanks @jlovellcsiro I have some general figures for some models, from datasheets, manuals or manufacturers information. I definitely think the community would benefit to have these theoretical lifetimes publicly available.

RomainCancouet commented 3 years ago

I've created a GoogleDoc to maintain and document the current list status of suggested ending causes. The Gdoc is opened in suggestion mode for collaborative work. I have integrated the comments received so far. We will integrate new entries if we receive further feedback from Argo teams.

tcarval commented 3 years ago

The FloatEndingCause terms will appear in the NetCDF metadata file “ANOMALY” variable. However, "ANOMALY" is not llimited to float ending cause. We may want to add a specific “FloatEndingCause” variable in the NetCDF metadata file.

vincentbernardclayton commented 2 years ago

just a comment : couldn't we use the END_MISSION_STATUS parameter ? and update it with more options : END_MISSION_STATUS ; END_MISSION_STATUS:long_name = "Status of the end of mission of the float" ; END_MISSION_STATUS:conventions = "T:No more transmission received, R:Retrieved" ; END_MISSION_STATUS:_FillValue = " " ;

RomainCancouet commented 2 years ago

The FloatEndingCause terms will appear in the NetCDF metadata file “ANOMALY” variable. However, "ANOMALY" is not llimited to float ending cause. We may want to add a specific “FloatEndingCause” variable in the NetCDF metadata file.

I move back here the comments from the GoogleDoc which were related to the data management and storage issues, not the content of the vocab list:

29.11.2021 Claudia:

An example would be nice. Will it look like this: ANOMALY = "ENDING_CAUSE_CATEGORY=text1; ENDING_CAUSE=text2; whatever other text that could belong here as well"? My question basically doubles as a suggestion. I do think we need some kind of standard, and the first thing that popped in my head was what I wrote as part of the question.

04.12.2021 Thierry:

We may use a syntax close to JSON (https://fr.wikipedia.org/wiki/JavaScript_Object_Notation) or YAML key1:value,key2:value,... ANOMALY = "ending_cause_category:hardware,ending_cause:ballast" ANOMALY = "ending_cause_category:null,ending_cause:unchecked"

06.12.2021 Claudia:

that would work.

RomainCancouet commented 2 years ago

29.09.2021 Thierry:

Will ending_cause_category appear in the NetCDF file ? Probably not, only the ending_cause will be reported in the "ANOMALY" variable.

tcarval commented 2 years ago

Hi @RomainCancouet , I added a new reference table 31 in the argo-dm-user-manual-WorkInProgressVersion You can comment/improve it.

RomainCancouet commented 1 year ago

Hello,

I am coming back regarding the ending cause vocabulary. We have progressed and established a list (20230919_AVTT_EndingCauses_Euro-Argo_2021-2022.xlsx) of possible ending causes for Euro-Argo floats that became inactive in 2021 or 2022. For each float, the ending cause(s) from the vocab is listed in column N (separated by commas if applicable), and I added in column O an ending_cause_comment as free text, that could help or provide more details regarding the possible reason of failure.

My suggestion to move forward is first to update the OceanOPS ending cause metadata for these floats. I see multiple advantages recording the ending_cause in OceanOPS:

If some groups/DACs would then like to report the metadata in the netCDF file, I think @vincentbernardclayton suggestion

just a comment : couldn't we use the END_MISSION_STATUS parameter ? and update it with more options : END_MISSION_STATUS ; END_MISSION_STATUS:long_name = "Status of the end of mission of the float" ; END_MISSION_STATUS:conventions = "T:No more transmission received, R:Retrieved" ; END_MISSION_STATUS:_FillValue = " " ;

to use/update the existing parameter END_MISSION_STATUS of the meta file is a good idea.

I am happy to receive your feedback on the list and above suggestions.

RomainCancouet commented 1 year ago

Capturing ending_causes (e.g. list for Euro-Argo floats that became inactive in 2021 and 2022) helps us share and document floats (or sensors) failures across teams. In most cases (here 80%) it was possible to determine a possible ending_cause using the defined vocabulary entries. It also allows to obtain statistics. E.g.:

Or pursue further investigations should similar floats failures happen for different teams.

tcarval commented 11 months ago

In my opinion, we should keep "END_MISSION_STATUS" as it is. Status of the end of the float’s deployment.

If we want to have the float ending cause of o float in the NetCDF metdata file, let's add a new variable "FLOAT_ENDING_CAUSE" as string256

emmerbodc commented 11 months ago

I just wanted to add there is a possible collection in existence that could perhaps be used that is - SensorML History Event Types http://vocab.nerc.ac.uk/collection/W03/current/ Requests can be logged here - https://github.com/nvs-vocabs/W03 Thanks,

vpaba commented 4 months ago

Thanks @emmerbodc. Would the Argo community consider using the W03 collection linked above @tcarval, by adapting and uploading to it the ENDING_CAUSE terms created by @RomainCancouet?

Could this collection be something shared with other observing systems @mbelbeoch, @MKrieger29, @anthoninlize ?

RomainCancouet commented 2 months ago

OceanOPS (@vturpin @MKrieger29) would like to suggest adding a new ending_cause "transfer_ownership" in the table to deal with a recent case. I have no objection and have therefore added this entry in the word document

vturpin commented 2 months ago

To give a bit of context : "Transfer_ownership" happen when a float is "given" to a new program during its life time at sea.

We recently have this case when Argo Brazil could not fund more than the first 6 month of communication for a floats (outside of the durtion of the project that funded and deploy the float). In that case, Argo UK offer to pay for the com' if the float became a UK float. Consequently we had to set up a procedure to identify this change in the metadata management. This lead to the creation of the ending cause "transfer_ownership".

tcarval commented 1 month ago

To give a bit of context : "Transfer_ownership" happen when a float is "given" to a new program during its life time at sea.

We recently have this case when Argo Brazil could not fund more than the first 6 month of communication for a floats (outside of the durtion of the project that funded and deploy the float). In that case, Argo UK offer to pay for the com' if the float became a UK float. Consequently we had to set up a procedure to identify this change in the metadata management. This lead to the creation of the ending cause "transfer_ownership".

In my opinion, this is not an ending cause, the floats are still alive with all data processed by BODC dac.

tcarval commented 1 month ago

@RomainCancouet , @vturpin Do we create the new "ending cause" vocabulary. Otherwise let's close this ticket.

vturpin commented 1 month ago

Considering now the float data is fully transfer to DAC, then I agree "transfer_ownership" is not an ending cause. However, in OceanOPS we need to conserve this term to traduce the situation and not loosing the contribution of Argo Brazil.

My suggestion is to keep this ending_cause in the AIC system but not adding it to the ending_cause reference table. This is a situation we can manage internally.