ACTRIS-Data-Centre / actris-vocabulary

Creative Commons Zero v1.0 Universal
2 stars 1 forks source link

Data quality control outcome #8

Closed claudio-dema closed 2 months ago

claudio-dema commented 1 year ago

We need to define if the quality control procedures have been passed (not only applied) fully or partially. This is of interest of the users and should be a search criteria on the ACTRIS data portal.

We propose: data quality control outcome: Describes the outcome of data quality control procedures --> data quality control fully passed: The full extent of data quality control as defined by ACTRIS procedures for the property has been passed. --> data quality control partially passed: Data quality control as defined by ACTRIS procedures for the property has been partially passed.

To avoid misunderstanding or confusion, we also propose to rename:

The definitions can stay as they are, except for the typo "extend"->"extent".

markusfiebig commented 11 months ago

I would like to take this opportunity to take up a proposal from ENVRI-FAIR. At the end of the project, we proposed the IODE data quality flagging scheme as common denominator for data quality flagging. The scheme has a common primary level, and a user defined secondary level where repository specific flags can be included. I would propose to use the primary level as common denominator also between ACTRIS DC units, both for flagging individual values, but also whole files as ARES does. The IODE primary flags are:

Value Primary-level flag short name Definition
1 Good Passed documented required QC tests
2 Not evaluated, not available or unknown Used for data when no QC test performed or the information on quality is not available
3 Questionable/suspect Failed non-critical documented metric or subjective test(s)
4 Bad Failed critical documented QC test(s) or as assigned by the data provider
9 Missing data Used as place holder when data are missing

Concerning clarification of concepts, I would propose to remove the word data from the label. That way, the concepts can also be used for quality control of instruments etc. This would result in:

siiptuo commented 9 months ago

In Cloudnet data portal, we have the following quality control outcomes for whole files: pass, info, warning and error. It looks like we can easily map these to the IODE data quality flagging scheme.

markusfiebig commented 5 months ago

Telecon agrees to use the IODE flagging scheme. Can be used for both, flagging whole data files and individual data points. Units can decide whether to use pref label or definition. Will be added under "quality control outcome".

Agree to shift:

claudio-dema commented 4 months ago

Hi all, ARES is fine with the example presented by Markus to the ET Team yesterday.

We'll use the full definition on landing pages.

Ok to use the short terms for search in the data portal / metadata schema, but at least for the search filter label in the data portal we would prefer to have only the word "questionable" and not the word "suspect" (no one would search for "suspect" data).

Please, @richard-olav let us know if it's possible to have the updated metadata schema by the end of February. This would avoid to send the metadata twice.
In the metadata schema, we suggest to add two new optional parameters in the "dq_data_quality_information" model:

Thanks!

richard-olav commented 4 months ago

Hi Claudio. I can't make a decision in relation to the vocabulary itself (@markusfiebig should be the right one to make a decision), but as far as changes in the metadata schema are concerned, we have decided that we have to wait, since we have many issues pending. Unfortunately, it is too late compared to a number of other tasks we have planned, winter vacation etc. I will give more feedback in an email to the whole group.

markusfiebig commented 4 months ago

Hi everyone! We had a discussion here also involving Cathrine for coming to a conclusion concerning questionable and suspect. There is a slight difference in meaning between these words, with suspect being a little stronger than questionable, which is why ARES would like to remove suspect. On the other hand, with making such a change, we are breaking the direct connection with the IODE flagging scheme. Having this connection is an important aspect by itself thinking about ENVRI and EOSC interoperability. In this light, we think that the difference in meaning between questionable and suspect isn't strong enough to justify breaking the direct link to IODE.

claudio-dema commented 2 months ago

Hi all! I'm reporting part of my answer given by email on February 15:

Thank you for spending some time on this matter, we appreciate. I'd like just to clarify that ARES did not request to remove "suspect" from the ACTRIS vocabulary. Our concern was only related to the terms shown in the filter on the DVAS search data portal. We are asking to leave the direct link to IODE in the vocabulary with the short name "Questionable/suspect", and to have only the word "Questionable" in the search filter on the data portal and in landing pages. In this way we could keep the connection with the IODE flagging scheme intact (at vocabulary level) and we could help users to understand clearly what kind of data they are dealing with (at data portal level).