Proposal: Add QARTOD quality flag names to standard name list

cf-convention / cf-conventions

AsciiDoc Source

http://cfconventions.org/cf-conventions/cf-conventions

Creative Commons Zero v1.0 Universal

89 stars 46 forks source link

Proposal: Add QARTOD quality flag names to standard name list #216

Closed jessicaaustin closed 4 years ago

jessicaaustin commented 5 years ago

UPDATE: This proposal has been revised; please see the latest version in the comments below.

The original proposal is below.

We are proposing adding QARTOD quality flag names to the standard name list. Adding these flags allows us to define exactly which dataset variable represents the results of a particular QARTOD test for a particular data variable. This is related to the recently added quality_flag standard name, but is more specific to QARTOD.

Note: We were originally trying to achieve this with a flag_methods variable attribute but decided to try the standard_name approach instead. More background: https://github.com/cf-convention/cf-conventions/issues/205

Proposed names:

Name	Description	Units
qartod_aggregate_quality_flag	This flag is a summary of all QARTOD quality tests run for another data variable, and is set to the highest-level (worst case) flag found. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_gap_test_quality_flag	Result of the QARTOD Timing/Gap test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_syntax_test_quality_flag	Result of the QARTOD Syntax test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_location_test_quality_flag	Result of the QARTOD Location test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_gross_range_test_quality_flag	Result of the QARTOD Gross Range test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_climatology_test_quality_flag	Result of the QARTOD Climatology test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_spike_test_quality_flag	Result of the QARTOD Spike test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_rate_of_change_test_quality_flag	Result of the QARTOD Rate of Change test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_flat_line_test_quality_flag	Result of the QARTOD Flat Line test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_multi_variate_test_quality_flag	Result of the QARTOD Multi-variate test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_attenuated_signal_test_quality_flag	Result of the QARTOD Attenuated Signal test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
qartod_neighbor_test_quality_flag	Result of the QARTOD Neighbor test. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1

Example usage:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "status_flag qartod_aggregate_quality_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = 1, 2, 3, 4, 9;

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity QARTOD Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "status_flag qartod_flat_line_test_quality_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = 1, 2, 3, 4, 9;

@mwengren @kwilcox @kevin-obrien

roy-lowry commented 5 years ago

If a variable is carrying multiple QC flags as ancillary variables then some means of identifying which flag is which is obviously needed and the Standard Name is one way of doing this. However, this could potentially create thousands of Standard Names so I wonder if using standard name modifiers might be a better way to go. That way only twelve new concepts would need to be set up to cover this requirement.

kwilcox commented 5 years ago

@roy-lowry Are you proposing that in addition to the status_flag on the quality variables that we add one of (12) new modifiers that describes the type of quality check without the indication that it was the qartod implementation of the test? Sort of a sub-modifier that modifies the already existing status_flag modifier? Is there some precedence or another example where this happens in CF?

example of the (12) new modifiers

aggregate
gap_test
syntax_test
location_test
gross_range_test
climatology_test
spike_test
rate_of_change_test
flat_line_test
multi_variate_test
attenuated_signal_test
neighbor_test

If I understand your suggestion, you are trying to avoid every single implementation of a quality check from having to be a specific standard name. I agree, that would be tedious, we are only proposing the QARTOD specific standard names because they are published standards. Not every quality check would be applicable and we could have the conversation on a case-by-case basis.

ngalbraith commented 5 years ago

I really prefer the flag_methods approach - it seems much simpler to me.

In either case, providing the fact that the tests are defined by QARTOD could be accomplished with an attribute. This could be as simple as adding something like 'QARTOD_tests_V1' to the global 'Conventions' attribute, though that would prohibit mixing different test methods.

For a more flexible approach, you could provide it as a variable attribute on the status_flag (or flag_methods, if you go that route) instead:

int PSAL_qc_agg(t, z); PSAL_qc_agg: standard_name = "status_flag aggregate_quality_flag"; PSAL_qc_agg: vocabulary = "QARTOD_tests_V1";

Note, I think you DO need to include a version number somehow - everything changes, eventually.

roy-lowry commented 5 years ago

Nan's suggestion works for me. My only concern is that we don't start down a road where the variable identifier and quality flag identifier are embedded into the Standard Name. Initial intentions of limited usage can easily be broken down by future use cases.

ngalbraith commented 5 years ago

I'm not sure that you couldn't do this without any change to the CF standard, just by including a link to the relevant test, as you did in the other issue (205).

PSAL_qc_agg:standard_name = "status_flag"; PSAL_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING"; PSAL_qcagg:flag_values = "1, 2, 3, 4, 9"; PSAL_qcagg:references = "https://axiom-data-science.github.io/ioos_qc/aggregate_flag";

The text in flag_meanings is also completely unrestricted, so you could use something like:

   PSAL_qc_agg:flag_meanings = "qartod_aggregate_pass qartod_aggregate_not_evaluated 
                   qartod_aggregate_suspect qartod_aggregate_fail qartod_aggregate_missing"

That seems to contain all the information - as long as you include the 'references' attribute, pointing to the qartod_aggregate scoring method.

This could be implemented - and documented - by IOOS, and become a sort of de facto standard. Or, CF could certainly adopt some part of this - e.g. formalizing the 'references' attribute for status flags.

mwengren commented 5 years ago

@roy-lowry, re: your earlier comment:

My only concern is that we don't start down a road where the variable identifier and quality flag identifier are embedded into the Standard Name. Initial intentions of limited usage can easily be broken down by future use cases.

I am not sure I understand what you mean here about variable and quality flag identifiers in the standard name. I recognize your concern about adding an unknown number of future QARTOD or other status/quality flag identifiers beyond the 12 we list here, however I'm confused because there is no 'variable identifier' in these names, only identifiers for the QARTOD tests. These test names have been developed as part of the QARTOD project here at the US IOOS office, and have been adopted by groups outside of IOOS.

QARTOD test names may not quite match the intent of the standard names table, I recognize. In fact, this proposal is our second attempt at a way to make our use case fit with existing CF conventions (and therefore hopefully be more stable and adopted outside of our own community). The approach of adding QARTOD test names to the standard names table was recommended initially by @martinjuckes in discussion about our initial proposal (#205) to add the 'flag_methods' attribute, and we decided for our needs it is simpler to apply and easier to understand. The drawback being that the standard names table must serve as a vocabulary for the breadth of QARTOD test names (or other similar such names).

We think that even though there are many (over 100 perhaps) QARTOD test names in existence, few of them are implemented currently and we feel pretty confident these initial 12 will get us quite far for present needs. If the open-endedness of setting a precedent like this is too much of a concern though, what about our initial proposal of 'flag_methods' (#205), coupled with an externally-managed vocabulary referenced by URI in the file?

It seems that being able to characterize the process/procedure of the 'flag_values/flag_meanings' ancillary variables in some standard way is of benefit to CF from discussion in both issues.

How does the decision making process work from here? We'd like to continue to move that forward and advocate for one of our two proposals (whichever is more agreeable to those who wish to weigh in).

Some of the alternatives proposed here, although workable, are somewhat less concise and/or not as easily machine readable as either of our two proposals (machine readability is one of our primary goals in the design).

Thanks for your continued consideration.

roy-lowry commented 5 years ago

Dear Micah,

I think there is some degree of misunderstanding here, so I will try and clarify. What I am trying to prevent is having to create up to a dozen new Standard Names for every measurement, such as:

'sea_water_practical_salinity_gap_test' 'sea_water_practical_salinity_neighbor_test' 'sea_water_practical_salinity_at_sea_floor_gap_test' 'sea_water_practical_salinity_at_sea_floor_neighbor_test'

At its inception CF proposed a mechanism to prevent this kind of proliferation by parameter statistics (mean, standard deviation, maximum, minimum etc.) through Standard Name modifiers called Cell Methods. This was subsequently extended by the semantically broader Standard Name Modifiers. These are formally-defined terms (in Appendix C of the CF Conventions document - possibly should also be in NVS as a vocabulary, but that's another issue) that are included in the Standard Name strings separated by a space. So the above examples become:

'sea_water_practical_salinity gap_test' 'sea_water_practical_salinity neighbor_test' 'sea_water_practical_salinity_at_sea_floor gap_test' 'sea_water_practical_salinity_at_sea_floor neighbor_test'

The difference is possible hard to spot. One underscore has been replaced by a space in each case. The big difference is in the work needed for implementation. Adding the 12 QARTOD tests would require adding 12 entries to Table C1 in the Conventions document. These are then available to modify every existing - and still to be created - Standard Name. This is the full description of my proposal above and is exactly as understood by @kwilcox in the comment above.

Nan proposed an alternative mechanism that would require one Standard Name setting up per QARTOD flag. I stated that I could live with this as it addressed my primary concern of preventing a proliferation of Standard Names. It could work - don't forget the flags are linked to their variables through ancillary variable attributes - but is possibly more opaque than the Standard Name Modifier route. As I think more about it I am less attracted to it.

I would suggest that the best way to take things forward would be for the proposers of this ticket to decide and state their preference out of the three alternatives in the above discussion namely:

1) As proposed (up to 12 new Standard Names per measurement) 2) Using Standard Name Modifers 3) Using flag_methods

and see where the discussion goes from there.

JimBiardCics commented 5 years ago

I am in favor of @roy-lowry's options 2 or 3.

mwengren commented 5 years ago

@roy-lowry Just to clarify, we aren't proposing what you described above. So Option 1 in your list above is incorrect. The QARTOD test names - although specific to QARTOD - wouldn't require the test name to be attached to any existing geophysical or other measured parameter standard name. So the examples you listed like sea_water_practical_salinity_gap_test name is not something we're advocating for here.

It sounds like you're conflating @ngalbraith's suggestion with our original proposal somewhat. We're proposing in this issue the addition of 12 QARTOD test names in the standard names table (to begin with), which are ancillary to the actual measured variable and leverage the ancillary_variables/status_flag approach already within CF to define the association. Minimal example:

float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "status_flag qartod_aggregate_quality_flag";

int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity QARTOD Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "status_flag qartod_flat_line_test_quality_flag";

Because QARTOD is a published standard approach to real-time quality control, we felt the specific test names it defines were appropriate to include in the standard names table (i.e. as an accepted standard practice by multiple oceanographic groups). Of course, this would be setting a precedent for additional names to be added in the future (even additional programs similar to QARTOD) which I agree is a risk/concern, and may be counterproductive. We feel the 12 will meet our foreseeable needs, however.

I think Option 2, or 'Standard Name Modifiers' as listed would work as well, however you would still need to associate the measured variable with the ancillary flag variable(s) using the ancillary_variables attribute I think right? There may be multiple sea_water_practical_salinity variables in the same file, after all.

Option 3 'flag_methods' is akin to Option 1 (as we intended it), just with the flag/test name vocabulary to use in flag_methods attribute maintained in an external system and not written into the CF conventions anywhere.

We haven't had a chance to discuss this within our group, but I just wanted to clear up misconceptions in the meantime. Thanks for everyone's input on this.

roy-lowry commented 5 years ago

Many thanks for this clarification and apologies for misunderstanding your original proposal, partly due to not realising that there was a scroll bar on the window and so only seeing a truncated version of the example. I'll get used to GitHub one day!

Now I understand what you are requesting it has my support. @ngalbraith 's point about incorporating versioning could be addressed by including a versioned QARTOD reference in the Standard Name definitions. I don't feel that inclusion of version numbers in Standard Names is a precedent to set here.

ngalbraith commented 5 years ago

I can't support including the word QARTOD in the list of test names. These basic tests are used in many communities, in data that's not real time and in non-ocean data areas. The test details are not always exactly the same as the QARTOD versions, and there may be additional tests that QARTOD will not want to document. The list will spiral out of control if we have to identify the test publisher within the standard name list - I thought we had covered this ground earlier.

The test authority, along with the versioning info, should be named elsewhere, in an attribute. E.g.

int psal_qc_agg(time, z); psal_qc_agg:long_name = "Salinity QARTOD Aggregate Flag"; psal_qc_agg:standard_name = "status_flag aggregate_quality_flag"; psal_qc_agg:vocabulary = "QARTOD tests V1.1"; psal_qc_agg:reference = "https://ioos.noaa.gov/QARTOD tests/V1.1/Aggregate_score.html";

roy-lowry commented 5 years ago

Looks like a fundamental issue here is whether or not QARTOD is a sufficiently well-developed, supported and respected standard for the label to convey semantics that add value to a Standard Name or Standard Name Modifier. My impression/understanding is that this is so but @ngalbraith obvious disagrees strongly.

So, if this proposal is going to get anywhere the community behind it has to decide if they are happy to lose the 'QARTOD' references or find additional support for their inclusion from the CF community.

A couple of additional comments of technical detail brought to mind by @ngalbraith 's last comment.

1) The specific tests are treated as Standard Name Modifiers of the Standard Name 'status_flag', not Standard Names. I strongly suspect that if the tests were added to Table C1 they would be appended to measurement Standard Names rather than 'status_flag' in common usage even though this is less semantically correct. Making the tests Standard Names rather than Standard Name Modifiers would prevent this.

2) If the QARTOD is removed from the labels then the definitions would need to be rewritten to explain what the tests are.

3) Whilst 'long_name' and 'standard_name' are part of the CF Conventions, I don't think that 'vocabulary' and 'reference' are.

graybeal commented 5 years ago

I think the question is "to what extent should a standard name be specific to a particular implementation?" (rather than how well-developed, supported and respected QARTOD is). Because there will be many non-QARTOD implementations of these tests. So, where should the abstraction be made? I understand the general principle of CF is to encourage interoperability by labeling the result, not any unique process for obtaining the result. (So that I can find, e.g, all the variables that have flat line test results, and at first level treat them as comparable.)

Of course it will be the case that specific algorithms (and their parameterizations) will differ, whether QARTOD vs MBARI or QARTOD v1 vs QARTOD v2. Just as the specific techniques differ by which a salinity value is obtained, these techniques will differ under the covers, and users will have to decide how deep they want to dive into detailed provenance information about the processing.

jessicaaustin commented 5 years ago

We discussed this internally and came up with the following:

(1) We agree that removing the QARTOD from the test names is fine, since the result is still perfectly useful for our cases, and in fact makes the names more generic and widely useful.

(2) It's clear from this thread that we are in a gray area between Standard Names and Standard Name Modifiers. But we prefer adding these as Standard Names rather than modifiers.

(3) Since vocabulary and references are not part of the CF conventions, that isn't explicitly part of our proposal. But even without those attributes we believe the standard name alone conveys useful information. We plan to enforce use of these attributes in our own metadata profile, which is built upon CF and other standards.

Since this has had a lot of back-and-forth, we've updated our original proposal and pasted it below. We removed references to QARTOD, added descriptions of each test, swapped the test standard name and status_flag in the standard_name attribute to make it clear status_flag is the modifier in the example, and updated the proposal description.

Again, thank you everyone for all the great input.

Updated proposal:

We are proposing adding quality test flag names to the Standard Name list. Adding these flags allows us to define exactly which dataset variable represents the results of a particular quality test for a particular data variable. While the list below is not comprehensive -- that is, it does not contain a full list of every possible quality test that could be run -- as it stands it we meet our needs for the foreseeable future. If accepted, we expect others will propose expanding this list to include other testing methods as needed.

Proposed names:

Name	Description	Units
aggregate_quality_flag	This flag is a summary of all quality tests run for another data variable, and is set to the highest-level (worst case) flag found. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
gap_test_quality_flag	Result of the Timing/Gap test, which checks that data has been received within the expected time window and has the correct time stamp. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
syntax_test_quality_flag	Result of the Syntax test, which checks that the data contains no indicators of flawed transmission. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
location_test_quality_flag	Result of the Location test, which checks that a location is within reasonable bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
gross_range_test_quality_flag	Result of the Gross Range test, which checks that values are within reasonable range bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
climatology_test_quality_flag	Result of the Climatology test, which checks that values are within reasonable range bounds for a given time and location. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
spike_test_quality_flag	Result of the Spike test, which checks that the difference between two points in a series of values is within reasonable bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
rate_of_change_test_quality_flag	Result of the Rate of Change test, which checks that the first order difference of a series of values is within reasonable bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
flat_line_test_quality_flag	Result of the Flat Line test, which checks for consecutively repeated values within a tolerance. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
multi_variate_test_quality_flag	Result of the Multi-variate test, which checks that values are reasonable when compared with related variables. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
attenuated_signal_test_quality_flag	Result of the Attenuated Signal test, which checks for near-flat-line conditions using a range or standard deviation. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
neighbor_test_quality_flag	Result of the Neighbor test, which checks that values are reasonable when compared with nearby measurements. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1	1

Example usage:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "aggregate_quality_flag status_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = 1, 2, 3, 4, 9;

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag status_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = 1, 2, 3, 4, 9;

jessicaaustin commented 4 years ago

@roy-lowry , @ngalbraith and @JimBiardCics , have you had a chance to look at our updated proposal? You've given a lot of great feedback and ideas so far, and we're curious to hear what you think of our latest updates. Thanks!

roy-lowry commented 4 years ago

@jessicaaustin I have been hanging fire to see if new watchers coming onto GitHub brings any additional comment. It's the next job on my CF work list.

roy-lowry commented 4 years ago

@stephenworsley @lbdreyer How would having the labels 'aggregate_quality_flag' etc. as Standard Names rather than Standard Name Modifiers (e.g. 'status_flag aggregate_quality_flag' affect the iris software development project? What would your preference be?

roy-lowry commented 4 years ago

I am happy with the text for labels and definitions in the revised proposals as either Standard Names or Standard Name Modifiers. However, I am becoming concerned about their being Standard Names.

In the revised proposal example the new concepts are implemented as Standard Names with the Standard Name Modifier 'status_flag'. But the status_flag entry in Appendix C of the Conventions is qualified by the text:

'The use of this modifier is deprecated and the standard_name status_flag is preferred to describe this type of metadata variable.'

So, that leaves the possibilities 'aggregate_quality_flag' or 'status_flag aggregate_quality_flag'. Looking at the project that linked with this thread yesterday it can be seen that flags require specific handling code. The obvious way to trigger this is to look for 'status_flag' in the Standard Name. So I think 'status_flag aggregate_quality_flag' is the better way to go.

lbdreyer commented 4 years ago

@stephenworsley @lbdreyer How would having the labels 'aggregate_quality_flag' etc. as Standard Names rather than Standard Name Modifiers (e.g. 'status_flag aggregate_quality_flag' affect the iris software development project? What would your preference be?

The difference between adding the flags as standard name modifiers versus standard name modifiers, as an impact on Iris, would be minor.

There is a slight preference for standard names. Standard names are provided in a machine readable format so any updates are pulled through easily, whereas for standard name modifiers we maintain a copy of the C1 table, so any updates to that table require us to update Iris.

So, that leaves the possibilities 'aggregate_quality_flag' or 'status_flag aggregate_quality_flag'. Looking at the project that linked with this thread yesterday it can be seen that flags require specific handling code. The obvious way to trigger this is to look for 'status_flag' in the Standard Name. So I think 'status_flag aggregate_quality_flag' is the better way to go.

I suspect we will write Iris such that it will detect whether or not something is a flag is by checking for the flag_values, flag_masks and flag_meanings attributes rather than using a standard_name. This will allow for users that may, for example, use a long_name rather than a standard_name.

So aggregate_quality_flag as a standard_name sounds perfectly sensible from an Iris perspective.

roy-lowry commented 4 years ago

Many thanks to @lbdreyer for removing my concerns. It would certainly be easier to go down the Standard Names route as it is adding to a controlled vocabulary rather than updating the Conventions Document. Semantic information in Standard Names is also more accessible as they are in vocabulary servers whereas the Conventions text is not.

So, providing the deprecated 'status_flag' Modifier is removed then I'm happy with these flag types being Standard Names.

mwengren commented 4 years ago

@roy-lowry I was going to comment about the need to remove the deprecated status_flag standard name modifier from the conventions document, but I see you agree that's necessary too. That will help eliminate some confusion for newcomers to CF's ancillary variable/flag syntax and how to properly apply it, I think. It confused me to have it mentioned in two places, anyway.

While coming up with this proposal, we were cognizant of the discussion about the newly-added quality_flag standard name over the summer/fall. I see now, given that the status_flag standard name modifier is to be truly deprecated, our files will need to only use the specific standard names we've described above (which in a way inherit from the root quality_flag name), without use of any modifiers such as status_flag. Adapted example from earlier comment:

float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg";

int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "aggregate_quality_flag";

I think it would be nice if the connection between our names and the quality_flag generic name could be reflected better within the CF standard name table, since there's no explicit connection between them in the descriptions. We could change our names to follow a pattern like:

quality_flag_aggregate
quality_flag_gap_test
quality_flag_syntax_test
etc...

and it would make the connection a little clearer. At least they would start with the same letter and appear next to each other alphabetically, and might lead people to associate them more easily.

Alternatively, we could mention quality_flag within the text descriptions of our new names.

Or, a new addition to 3.5 of the conventions document that explains the various names and includes a few examples?

Any thoughts out there about this?

I see we're starting down a path that @graybeal mentioned in the quality_flag discussion with greater specificity in our names beyond the more generic quality_flag name. Just wondering whether it makes sense to set a precedent here, and if so, what. Maybe too much of a rabbit hole...

roy-lowry commented 4 years ago

@mwengren I was referring to removing the Modifier 'status_flag' from the examples in the modified proposal, not from the Conventions Document. To do that would be very much against my understanding of best practice. Deprecation, not deletion is the way to go to deliver transparency.

Your modified example includes 'QARTOD' in the flag Standard Name, which I thought had been dropped.

I don't think it's a good idea to weave vocabulary updates into the Conventions Document text.

I prefer the Standard Name labels as presented in the modified proposal. Using alphanumeric sorting of labels to establish semantic relationships is a technique I used in the 1990s until I saw the error of my ways. What happens if somebody in the future adds another batch of Standard Names beginning 'quality_flag' that go off at a tangent to yours? There could be a way to establish a linkage between your Standard Names and the Standard Name 'quality_flag' using mappings in the vocabulary servers. I can point you in the right direction once the names are in the system.

I guess you could replace 'Result' at the start of the descriptions with 'A quality flag that reports the result'

mwengren commented 4 years ago

@roy-lowry Ah, poor quality control on the quality control examples! I fixed the error in my example above.

Using alphanumeric sorting of labels to establish semantic relationships is a technique I used in the 1990s until I saw the error of my ways.

Ok, I see your point about relying on similar alphanumeric name structure. If there's a way to establish a linkage using mappings, that sounds much better. I'm not too familiar with the standard name table so forgive my naive suggestions.

I guess you could replace 'Result' at the start of the descriptions with 'A quality flag that reports the result'

I am OK with making this change when these names are added - @jessicaaustin any thoughts? Should we update the comment above with new phrasing?

I do think improved examples in Chapter 3.5 in the conventions document would go a long way to clarifying usage of the collection of 'flag' standard names. After the lengthy discussion on the new quality_flag name, it would be shame for a newcomer to CF that wants to encode measurement quality information in CF to opt for status_flag when quality_flag (or any of the flags proposed here) might be more optimal, just because status_flag is used in the only example (Example 3.3) shown there with a standard name.

To me, the standard names list is rather opaque, good examples go a long way to explain the intended usage. Someone would have to read Chapter 3.5, go to the standard names page, search for 'flag', and then compare the definitions to see what name is most appropriate for their situation. There was a lot of good background shared in that discussion on best practice, but it's basically lost unless the documentation gets an update (IMO).

Thanks for your feedback thus far. Looking forward to merging these new terms if there's no dissent.

graybeal commented 4 years ago

I support Roy's other points, but whatever experiences showed Roy the error of his ways regarding name sorting never crossed my path. ;-) I find the wordings beginning with quality_flag considerably more intuitive, because the most important thing is that this is a quality flag, and the second most important thing is what kind of flag it is. (The aggregate_quality_flag is the only exception, I could go either way on that.)

The fact that it collects all the quality flags together in the list is a huge usability improvement, because it reflects how people often find things (by serendipitous browsing, rather than intentional searching), and creates an immediate understanding of the quality flag 'offerings' in CF. From a marketing standpoint this would have a positive impact.

{quote}What happens if somebody in the future adds another batch of Standard Names beginning 'quality_flag' that go off at a tangent to yours? {quote}

Then (presuming they are all quality flags or related) all of the quality_flag-related names are collocated, which is a good thing.

But of course I agree the relationship should be established in the definition, so all bets are covered.

Yay for improving the related examples in the Conventions document.

Alas, all of the advantages may be made moot by the semi-official constructs of the CF grammar, so I won't be surprised if there's a reason It Can't Be. Just sayin', it would be a lot more user friendly, at least for this user.

roy-lowry commented 4 years ago

@graybeal My preference is founded by English grammar in which a qualifying adjective precedes the noun reinforced by unfortunate experiences in the past whilst building systems based on label sorting. Only we metadata geeks think in terms of semantic hierarchies! However, my feelings aren't strong enough to become a blocking issue so I'd accept whatever the proposers decide.

jessicaaustin commented 4 years ago

Catching up here. Sounds like one change we've decided to make is to remove status_flag from the recommended usage, so that

 sea_water_practical_salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag status_flag";

becomes

 sea_water_practical_salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag";

On the other points:

"replace 'Result' at the start of the descriptions with 'A quality flag that reports the result'" -- The latter wording would be better for searches (in case someone searches for "quality" in the description) so I like that.
quality_flag_flat_line_test versus flat_line_test_quality_flag
- I predict most people are going to find these standard names through shared code, examples, open-source libraries, and searching the standard name list, so whether or not they are in ABC order can be a consideration but should not drive the decision.
- I also love the idea of updating the CF document with example usage if/when this proposal is accepted, and I think that will go a long way to improve discovery and understanding. We've got a great start here with some example usage.
- According to the CF standard name submission guidelines, "The order [of facets] is not rule-based; the goal is to make the name as clear and natural as possible", so we have some leeway to decide what we think is best.
- So my personal preference is to keep it as flat_line_test_quality_flag
- After speaking with the other proposers, we agreed to keep the original names, and make sure to focus on having clear examples, both in CF docs and our own documentation.

I've updated the original proposal again to reflect these changes: I removed status_flag from the examples, added "A quality flag that reports" wording, and kept the originally proposed names. See below.

Based on what I'm reading in this thread, it sounds like we're reaching a consensus? What are the next steps at this point, and the timing for those? Thank you!

Updated proposal:

Proposed names:

Name	Description	Units
aggregate_quality_flag	This flag is a summary of all quality tests run for another data variable, and is set to the highest-level (worst case) flag found. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
gap_test_quality_flag	A quality flag that reports the result of the Timing/Gap test, which checks that data has been received within the expected time window and has the correct time stamp. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
syntax_test_quality_flag	A quality flag that reports the result of the Syntax test, which checks that the data contains no indicators of flawed transmission. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
location_test_quality_flag	A quality flag that reports the result of the Location test, which checks that a location is within reasonable bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
gross_range_test_quality_flag	A quality flag that reports the result of the Gross Range test, which checks that values are within reasonable range bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
climatology_test_quality_flag	A quality flag that reports the result of the Climatology test, which checks that values are within reasonable range bounds for a given time and location. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
spike_test_quality_flag	A quality flag that reports the result of the Spike test, which checks that the difference between two points in a series of values is within reasonable bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
rate_of_change_test_quality_flag	A quality flag that reports the result of the Rate of Change test, which checks that the first order difference of a series of values is within reasonable bounds. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
flat_line_test_quality_flag	A quality flag that reports the result of the Flat Line test, which checks for consecutively repeated values within a tolerance. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
multi_variate_test_quality_flag	A quality flag that reports the result of the Multi-variate test, which checks that values are reasonable when compared with related variables. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
attenuated_signal_test_quality_flag	A quality flag that reports the result of the Attenuated Signal test, which checks for near-flat-line conditions using a range or standard deviation. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1
neighbor_test_quality_flag	A quality flag that reports the result of the Neighbor test, which checks that values are reasonable when compared with nearby measurements. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute.	1	1

Example usage:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "aggregate_quality_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = 1, 2, 3, 4, 9;

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "flat_line_test_quality_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = 1, 2, 3, 4, 9;

ngalbraith commented 4 years ago

I think this is fine. It seems clear, and limits the number of new standard names that will be needed.

Of course, if people use these without specifying the vocabulary and/or some reference to a description of the tests, they lose a lot of their meaning. My aggregate_quality_flag might represent something totally different from the QARTOD aggregate, but I think that's the trade off for limiting the number of new terms needed.

I'm not sure if we need to spell that out in the documentation somewhere, recommending some terms to help define the tests that were applied, or if we can leave it to communities that are planning to make these standard names part of their specifications to do that.

Thanks for your patience (and perseverance) getting this done.

castelao commented 4 years ago

@jessicaaustin and everyone else involved in this proposal and discussions, thank you very much for your time on this. It will be a great advance. I'm excited to see this concluded and start using it.

I think it could be useful to include gradient_quality_flag and density_inversion_quality_flag in this proposal. A little further, I think that it would be great to include all the tests of all standards that QARTOD was based on (Argo, GTSPP, ...). I see great power in being able to distinguish which procedure was done, but for that, the different procedures must be in the system. Someone will need to add those eventually, so why not doing it from the start?

I also have a question on something that @ngalbraith raised before. Following the latest version of the proposal, how would be the best way to identify the details of some applied test? For instance, how could I find (machine-readable) what was the upper and lower limits used for a QARTOD climatology test? Or if it was a GTSPP climatology instead, how many standard deviations was the tolerance? Which climatology was used (WOA13, WOA18, ...)? Could I have two different climatologies in the same dataset? How would be the best way to distinguish them? Thanks!

ngalbraith commented 4 years ago

I personally think including this level of detail (number of standard deviations, upper/lower limits, gap length, name of climatology used, distance to 'neighbor' data, etc) is beyond the scope of CF; there are just too many details. Every test has its own inputs, and these may vary with different organizations using this new set of standard names. Even QARTOD tests may be implemented in different ways by different organizations, who may have different requirements for documentation.

If you feel that your data providers' quality assessments need to come with machine readable details, you could define those terms as extra required attributes for QARTOD (or more correctly, for the IOOS implementation of QARTOD) compliance; this can be documented in the resource (web page or service, presumably) that describes the specific test. The data files would be perfectly usable without these details, but I can see why you'd want a standard vocabulary to describe the inputs. (That said, many users will simply trust the data providers' assessment anyway - and assume they chose appropriate levels for the test inputs.)

In the OceanSITES project, we assign additional required attributes to items that are part of CF (or part of another standard, like NCADD). A file can be CF compliant and not include these details, but it's not OS compliant unless it has them. The OS definitions are in our data format reference manual (and, more briefly, our other data manuals), but the web page that defines the IOOS's implementation of the QARTOD tests could include some description of the additional attributes required. e.g. for the flat_line_test:

qc_flat_line_test IOOS required attributes

standard_name = "flat_line_test_quality_flag";
flag_meanings, flag_values
min_length_points (or min_length_minutes)

mwengren commented 4 years ago

@ngalbraith for the upcoming version of our 'IOOS Metadata Profile' that incorporates these new standard names into a quality flagging scheme for QARTOD, we decided to leverage the references attribute to suggest data providers link to external web pages or web-accessible files (e.g. JSON) that contain the parameters used in the tests. The documentation for this scheme can be found here.

Our QC library is written so that JSON or YAML files can be passed to the test functions to define the parameters, so there is a way for a direct integration there. @jessicaaustin can correct me if I have any of this wrong.

@castelao I don't have an opposition to adding additional test names to this proposal, however we do have a strong preference that these names be added to the next release of the standard names table (v71) so we can move our project along to the implementation phase - we've been waiting on resolution here.

If the CF stewards are willing to accept additional tests and merge them alongside those we've proposed, then I don't see a problem. It may make sense to open a new issue with the full list of additional test names, descriptions, etc, as in the tables above and refer back to this one (via cf-convention/cf-conventions#216 link) rather than include them in the same issue. Just my 2 cents. Thanks!

ngalbraith commented 4 years ago

The external files are fine, but there's one problem with using the 'references' attribute; that term is defined as a global attribute in CF. Most netCDF software does not handle a situation where a term is used as a global and a variable attribute the way you might expect - that's apparently a feature (and has caused problems with oceansites data).

I haven't had a chance to check your Metadata Profile yet; I hope you're also using the Conventions attribute, with a version number, for your IOOS details.

mwengren commented 4 years ago

@ngalbraith regarding the references attribute, I went back and checked Appendix A and was relieved to see it's valid as both a global and variable attribute. I don't know if there might still be software issues as a result or not, but I think we're ok in terms of the by the book rules for that, hopefully. The IOOS Compliance Checker 4.3.2 now checks attributes against the rules in this appendix, so if we weren't, our metadata profile would be failing our own checker, which isn't great.

We do suggest data providers add an IOOS label to the Conventions attribute (e.g. Conventions = "CF-1.6, ACDD-1.3, IOOS-1.2") in our profile, so I think we're good there as well.

@GeyerB @japamment I noticed there's been a lot of traffic on the discuss repo over the past week accepting new standard names, I just wanted to check that this issue and the list of new names in the comment above are on your radar and can be approved to be added to Version 71 due out soon. We would very much appreciate it.

Many thanks!

japamment commented 4 years ago

@jessicaaustin @mwengren many thanks for these standard name proposals and my apologies for the delay in responding. Thank you also to all those who have contributed to this interesting discussion.

It seems the discussion has reached consensus on adding the terms as standard names rather than standard name modifiers. I think this is the right approach - I haven't dug through the CF mailing list archives but seem to recall previous discussions in which the suggestion was made to deprecate the use of modifiers altogether, although this hasn't made its way into the conventions so far.

The proposals have all now been added to the standard names editor. The names themselves look fine and canonical units of '1' is correct for flag variables.

The proposed descriptions are very clear and understandable, even to a non-expert such as myself. The only change I would suggest is some additional text to help guide CF users to the most appropriate name. For example, the description of gap_test_quality_flag would be: 'A quality flag that reports the result of the Timing/Gap test, which checks that data has been received within the expected time window and has the correct time stamp. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute. There are standard names for other specific quality tests which take the form of X_quality_flag. Quality information that does not match any of the specific quantities should be given the more general standard name of quality_flag'. Similar text could be added to all the descriptions, except for that of aggregate_quality_flag. (This type of guidance is provided in the descriptions of many existing standard names, such as those relating to salinity or area_type).

Regarding aggregate_quality_flag, I spotted one potential problem with the description. The text 'This flag is a summary of all quality tests run for another data variable, and is set to the highest-level (worst case) flag found' could be understood to refer to the existing quality_flag name as well as the specific names in this proposal. Is that what you intend? If not, I suggest amending this part of the text to read 'This flag is a summary of all quality tests run for another data variable, which have standard names of the form X_quality_flag, and is set to the highest-level (worst case) flag found. Information contained in a variable having the generic name quality_flag is excluded from the aggregate.'

If you are happy with my suggestion for additional help text and can let me know which version of the aggregate_quality_flag description is correct, then I think all the proposals can be accepted for inclusion in the standard name table.

I do have a couple of further questions about aggregate_quality_flag which don't affect the current proposal but will be important for future reference:

If we were to add standard names for more QARTOD defined quality tests, as has already been suggested, would the results of those tests then also form part of the aggregate?
Would the aggregate flag include the results of quality tests that were defined by some standard other than QARTOD if they happened to appear in the same data file?

Best wishes, Alison

ngalbraith commented 4 years ago

Thank you Alison, this is all good. I DO have a couple of issues with this paragraph:

'This flag is a summary of all quality tests run for another data variable, which have standard names of the form X_quality_flag, and is set to the highest-level (worst case) flag found. Information contained in a variable having the generic name quality_flag is excluded from the aggregate.'

I like the additional text excluding the generic quality_flag from the aggregate, but

1) In QARTOD, perhaps, it's always true that the aggregate is set to the highest-level (worst case) flag; that isn't true in all testing schemes.

There are cases when a test is run and the results are not considered important for some reason - failing a spike test because of data is determined to represent an actual measurement, or ... failing a gap test because of poor telemetry.

The point of the long drawn out discussion was in part to make sure these standard names could be used by other QC systems; letting those systems document their own method of setting the value of the aggregate is important.

2) Quality tests run for another data variable doesn't seem right - the aggregate is normally for tests run on this particular data variable, no? Why not run for this or a related data variable?

Regarding your two questions:

I'd like to have some text describing the best way to convey the name of the QC system being used - maybe a recommended attribute on each of these quality variables? That would solve the problem of people mixing QARTOD and other tests. Can we do that in the standard name guidance, or does QARTOD need to describe that themselves?

Also, could we recommend that the aggregate have a way to list its component tests? These could be given as ancillary variables.

jessicaaustin commented 4 years ago

Thank you Nan and Alison for your comments and suggestions.

I think we're all still in agreement on the standard name list, including Alison's suggestions for additional text to add to the non-aggregate flag names. The wording and scope for the aggregate flag still needs more clarification though.

After some internal discussion, here's our proposed revision for the aggregate_quality_flag definition:

Original: This flag is a summary of all quality tests run for another data variable, and is set to the highest-level (worst case) flag found. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute

Proposed (differences highlighted): This flag is a summary of all relevant quality tests run for the related ancillary parent data variable, and is set to the highest-level (worst case) flag found. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute. The aggregate quality flag represents the summary of all quality tests performed on the data variable, whether automated or manual, and whether present in the dataset as independent ancillary variables to the parent data variable or not.

Our justification:

Through this process, we have specifically decided to not use the word QARTOD in our standard names, which means we are leaving these flags open to describe any kind of QC process: QARTOD, some other well-known testing scheme, human-in-the-loop testing, etc -- or even some combination of all these. These processes are complex and therefore difficult to describe completely using variable attributes.

We are trying to strike a balance between keeping things simple and generic, while still being descriptive enough to be useful. Introducing a way to describe a single roll-up/aggregate/summary QC flag for a variable is extremely powerful! It is especially useful when it comes to writing scripts that use this data, which is the main motivation for this proposal. So we want to keep aggregate_quality_flag broad enough to encompass all kinds of testing that might happen on a data variable. We also want to make it easy enough to understand and use, so that it is widely adopted.

In that same vein, we are purposefully leaving out any specifics on how to define exactly what tests contributed to the aggregate flag. An external script would treat aggregate_quality_flag the same no matter how it was constructed. A human might want to know what tests made up the aggregate flag, but they could look to links or text elsewhere in the dataset metadata that describes the QC process (long_name, comments, references, history, etc). Again, this is to strike a balance between being descriptive and being easy to use.

An example:

Consider a scenario where you have QARTOD running in real-time as your data comes in, and you also periodically do human-in-the-loop testing. The QARTOD tests are very well-defined, but the HIL testing is not -- maybe you have a combination of MATLAB scripts and manual flagging based on plots, for example. So in terms of the dataset, you could have something like:

float sea_water_practical_salinity(time, z);
    sea_water_practical_salinity:units = "1";
    sea_water_practical_salinity:long_name = "Salinity";
    sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
    sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_gross_range_test sea_water_practical_salinity_qc_manual";

int sea_water_practical_salinity_qc_agg(time, z);
    sea_water_practical_salinity_qc_agg:long_name = "Salinity Summary QC Flag";
    sea_water_practical_salinity_qc_agg:standard_name = "aggregate_quality_flag";
    sea_water_practical_salinity_qc_agg:missing_value = 2;
    sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
    sea_water_practical_salinity_qc_agg:flag_values = 1, 2, 3, 4, 9;

int sea_water_practical_salinity_qc_gross_range_test(time, z);
    sea_water_practical_salinity_qc_gross_range_test:long_name = "Salinity Gross Range QC Test Flag";
    sea_water_practical_salinity_qc_gross_range_test:standard_name = "gross_range_test_quality_flag";
    sea_water_practical_salinity_qc_gross_range_test:missing_value = 2;
    sea_water_practical_salinity_qc_gross_range_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
    sea_water_practical_salinity_qc_gross_range_test:flag_values = 1, 2, 3, 4, 9;

int sea_water_practical_salinity_qc_manual(time, z);
    sea_water_practical_salinity_qc_manual:long_name = "Salinity Manual Review QC Tests Flag";
    sea_water_practical_salinity_qc_manual:standard_name = "quality_flag";
    sea_water_practical_salinity_qc_manual:missing_value = 2;
    sea_water_practical_salinity_qc_manual:flag_meanings = "PASS FAIL_BIOFOUL FAIL_INSTR FAIL_TELEM";
    sea_water_practical_salinity_qc_manual:flag_values = 0, 1, 2, 3;

(And hopefully you describe or link to your overall qc process, including HIL testing, somewhere in the global metadata!)

The manual tests and results could be very specific to your group -- in this example for the manual tests the data point could either be PASS, or could FAIL due to instrument issues, bio-fouling, etc.

While the QARTOD flat line test and the manual HIL tests involve different processes and flagging schemes, at the end of the day they should be somehow combined in a single QC result per data point. How you do that combination can be as arbitrary and complex as the QC process itself. This is where the aggregate_quality_flag is useful: the proposed definition gives you plenty of flexibility to include or exclude specific test results in your aggregate flag, depending on whether or not it is relevant at any given time.

Furthermore, we have come across scenarios where a group has a single qc flag per variable, that they update during periodic data reviews. In that case, the QARTOD and HIL variables would not be present -- just the "aggregate" flag. Hence the wording at the end: "whether present in the dataset as independent ancillary variables to the parent data variable or not".

Answers to specific questions from Alison and Nan

From Alison:

If we were to add standard names for more QARTOD defined quality tests, as has already been suggested, would the results of those tests then also form part of the aggregate?
- Yes, if they were relevant. It's up to the operator to decide which variables are used for the aggregate flag
- A script would treat the aggregate_quality_flag variable the same in either case. A human could look for documentation if they needed to know specifically what tests were used to create the aggregate flag
Would the aggregate flag include the results of quality tests that were defined by some standard other than QARTOD if they happened to appear in the same data file?
- Same answer as above

From Nan:

There are cases when a test is run and the results are not considered important for some reason - failing a spike test because of data is determined to represent an actual measurement, or ... failing a gap test because of poor telemetry. The point of the long drawn out discussion was in part to make sure these standard names could be used by other QC systems; letting those systems document their own method of setting the value of the aggregate is important.
- Yes, we agree with this sentiment
- Nan, do you agree that the proposed new wording is flexible enough to allow for this scenario?
I'd like to have some text describing the best way to convey the name of the QC system being used - maybe a recommended attribute on each of these quality variables? That would solve the problem of people mixing QARTOD and other tests. Can we do that in the standard name guidance, or does QARTOD need to describe that themselves?
- If a test has a name flat_line_test_quality_flag, does it really matter what QC system was used to create it? Isn't that the whole point of creating a generic standard_name for the test?
- If the QC system used is relevant to a user, we think that providing an overall description of the QC process (or a link to it) elsewhere in the dataset metadata is a simple yet effective way to convey this information
Also, could we recommend that the aggregate have a way to list its component tests? These could be given as ancillary variables.
- We believe this adds too much complexity without enough benefit. If the information is relevant to a user, then it could be listed in a global or variable attribute. But since this information is for humans and not scripts, we don't think it makes sense to create a strict definition here.

roy-lowry commented 4 years ago

There seems to be a contradiction creeping in between the aggregate_quality_flag definition and Jessica's answers to Alison's questions. The answers indicate a move towards a generic position in which the component parts of the aggregate are a moveable feast.

However, the definition includes a prescription of the algorithm to be used for the synthesis of the component flag values. Would it not be possible for another user of the Standard Name to have an algorithm that applied weightings to the different components. That is why Nan was objecting to the phrase 'set to the highest-level (worst case) flag found'.

So I think you need to take one final step down the road to a generic aggregate_quality_flag. Maybe something like:

This flag is an algorithmic combination of the results of all relevant quality tests run for the related ancillary parent data variable. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute. The aggregate quality flag provides a summary of all quality tests performed on the data variable (both automated and manual) whether present in the dataset as independent ancillary variables to the parent data variable or not.

ngalbraith commented 4 years ago

This is all an improvement. I like Roy's change to the definition of the aggregate flag, 'an algorithmic combination of the results of all relevant quality tests', in place of 'set to the highest-level (worst case) flag found'.

I DO have an issue with the term 'related ancillary parent data variable,' which I think could be shortened to the 'subject data variable', 'parent data variable', or something similar.

And, I'm not sure I agree with the decision to intentionally NOT describe what went into the aggregate - doesn't this render it more or less useless? A handful of new tests might be run on, and added to, a file, without updating the existing aggregate_quality_flag. This would lead to the assumption that the agg flag represented a lot of testing, when in fact it did not.

I'd still like to use the ancillary_variable attribute to explicitly tie the component tests to the aggregate, but that doesn't cover the case of files where the components are not included in the data file. Could we recommend an attribute like 'component_tests' for this?

The case where tests are run that follow different standards also seems to require an additional attribute. If the aggregate isn't going to define the testing system used, then maybe all the _quality_flag variables need to indicate which standard was used.

So here is what I'd like to see: This flag is an algorithmic combination of the results of all relevant quality tests run for the related data variable. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute on the data variable. The aggregate quality flag provides a summary of all quality tests performed on the data variable whether present in the dataset as independent ancillary variables or not. It is highly recommended that the aggregate_quality_flag have an attribute 'component_tests' that includes the test standards and test names, e.g. int psal_qc_agg(time, z); psal_qc_agg:standard_name = "aggregate_quality_flag"; psal_qc_agg:component_tests = "QARTODv1.1_SpikeTest, QARTODv1.1_RateOfChangeTest, QARTODv1.1_FlatLineTest";

Is that too much information for the data user to expect? Of course, we can always implement this additional info outside of CF, if it seems too detailed for the standard name guidance.

roy-lowry commented 4 years ago

I disagree that not describing the aggregate components through the Standard Name makes it useless. I can think of use cases where such minimal semantics are all that is needed (e.g. somebody who just wants to know which of a raft of flags to use to filter out problem data).

There are of course other use cases where the user wants to know how the aggregate was derived but in these cases I feel the user wants to know the whole story. As this issue has developed the complexity of the story has increased to the extent where I believe it has exceeded the limited semantic capabilities of CF. Consequently, I believe this story belongs in the data documentation as Jessica has suggested a couple of times.

jessicaaustin commented 4 years ago

"e.g. somebody who just wants to know which of a raft of flags to use to filter out problem data" -- Yes, this is our primary use case.

I think we all agree that in some cases the user wants to know exactly how the aggregate QC flag was derived. But we may not reach an agreement today on whether or not the data documentation is sufficient, or something like component_tests is necessary. That said, I don't think accepting this proposal as-is -- just adding a set of standard names to the table -- would preclude adding something like that in the future. Could we split that discussion into a separate issue so that we can close this one out?

Also, we are fine with Roy's most recent wording:

ngalbraith commented 4 years ago

This is fine with me. I think there are more words than needed in the text - e.g. 'both automated and manual' is superfluous, IMHO, and 'related ancillary parent data variable' is somewhat redundant, but I agree that overall this is ready to go. Thanks for your patience!

mwengren commented 4 years ago

@ngalbraith I think we added the 'both automated and manual' clause to account for the case in Jessica's example in the above comment, where there's a test with the generic quality_flag name that is manually performed with its own flagging scheme that is combined with the gross_range_test_quality_flag automated test which follows a different flagging scheme.

I think it also generally conveys the point that an aggregate flag is the one that a data user should query if they want to obtain the best data as @roy-lowry mentioned, however it was determined to be so - which also goes to your point above about how 'letting those systems document their own method of setting the value of the aggregate is important'. We were trying to add some flexibility in the definition.

I think we've come to agreement, however, so I don't want to sidetrack, hoping this helps explain that particular phrasing and why we chose to include it.

@japamment can we close the book on this addition when you are able to revisit it again, I suppose after the three week holding period? Many thanks for everyone's input on this.

japamment commented 4 years ago

@jessicaaustin @mwengren @ngalbraith @roy-lowry many thanks for all your comments and responses to my questions. This discussion has helped me to understand much better the ways in which quality information is gathered and used.

Regarding the 'non aggregate' names, I think we are all agreed and no further comments have been received. Hence, these names are accepted for publication in the standard name table and will be included in the next update, scheduled for 9th March. (N.B. The three week period that Micah mentioned applies to the process for modifying the main CF conventions document. For standard names it is our usual practice to accept names once all discussion points have been answered and consensus has been reached among those contributing to the conversation).

For the aggregate name it seems consensus has been achieved on: aggregate_quality flag 'This flag is an algorithmic combination of the results of all relevant quality tests run for the related ancillary parent data variable. The linkage between the data variable and this variable is achieved using the ancillary_variables attribute. The aggregate quality flag provides a summary of all quality tests performed on the data variable (both automated and manual) whether present in the dataset as independent ancillary variables to the parent data variable or not.' This name is also accepted for publication in the standard name table and will be added in the March update.

All the names, units and definitions are listed in full in the CEDA standard names editor.

I agree that it is important to know where to 'draw the line' between CF metadata and other sources of documentation. I do think it would be useful to use a CF attribute to state which quality control procedure has been used and hence guide data users to the appropriate documentation. This could perhaps be achieved using an existing attribute such as 'comment', which has the advantage of not needing to modify the CF conventions. It could also be achieved by adding a new attribute such as component_tests suggested by Nan. That discussion can certainly be the subject of a separate issue.

Whatever is decided ultimately, we could then update the standard name definitions by adding a sentence advising which other attribute(s) to check. For example, we currently have a lot of emissions names such as tendency_of_atmosphere_mass_content_of_alcohols_due_to_emission_from_solvent_production_and_use whose definitions say '"Solvent production and use" is the term used in standard names to describe a collection of emission sources. A variable which has this value for the standard_name attribute should be accompanied by a comment attribute which lists the source categories and provides a reference to the categorization scheme, for example, "IPCC (Intergovernmental Panel on Climate Change) source categories 2F and 3 as defined in the 2006 IPCC guidelines for national greenhouse gas inventories". Perhaps a similar approach would work for quality tests?

Best wishes, Alison

feggleton commented 4 years ago

These changes have been published in version 72 of the standard name table.

Please close this issue if all discussions are complete.

japamment commented 4 years ago

@feggleton thanks for publishing the names. I'm closing this issue now.

mwengren commented 4 years ago

@feggleton @japamment @ngalbraith @roy-lowry and others: thanks for publishing the names, contributing to the discussion, and helping steer this through to acceptance!