emo-bon / observatory-profile

Repository for the templates and additional metadata, that are used to semantically uplift emo-bon logsheet data into triples
0 stars 1 forks source link

dealing with ranges in the logsheets #21

Open kmexter opened 2 months ago

kmexter commented 2 months ago

We have discussed this in other issues but moving here to reduce the confusion There are some values in the logsheets that are entered as a <, >, or - (range)

For the size fractions (size_frac_up and low) this is just wrong and should be flagged as such in the QC. There is an upper and a lower value, and if one of them is not known (as it did not exist, so to speak), then the value in that cell should be either NA or blank. --> @kmexter make sure that this is not only flagged in the QC, but also transformed into just a value AND that any NA or 0 values are given a "no value for this" in the triples. @laurianvm may be able to advise on this.

For depth, we will take the ranges given and translate them into an upper and lower limit, and if we do get a < or > we will do the same, but only inthe TTL files - in the transformed logsheets these will continue to be just the string 3-4 or <10 or whatever. To be consistent over all stations, we will make all depths a max and a min. @laurianvm can you take care of this in your template - if not, let's chat

For the enviro measurements: at the next harvest I will ask that this is flagged specifically and so I will see if there are any like this (was it my imagination that I saw this?). If there are, we will deal with that in the ttl file only, by making that a "max" value

laurianvm commented 1 month ago

'<' --> max depth '>' --> min depth 3-4 --> min depth = 3 and max depth = 4 3 --> min depth = 3 and max depth = 3 (to be consistent!)

change ontology and templates to include this goal = consistent description of depth across all data

(not for the depth in observatory logsheets/data/templates - rather sampling tab)

laurianvm commented 1 month ago

@kmexter will check whether there are any max/min occurring in measurments

kmexter commented 1 month ago

For the QC there is a check that depth is < dpeth max, for this the check will either not check it (treat as a string float - float) or will check the max depth against the value from observatory tab. @cpavloud for ENA I think you said that depth has to be one values - can you remind me was that to be the min, the max, or the average?

kmexter commented 1 month ago

@kmexter to check if this is also the case for measurements, if so they have to be strings. In fact, if there any ranges in measurements, that should be an error - @cpavloud ? But also check if there are max/min values

cpavloud commented 1 month ago

I think it's more appropriate if we keep the maximum depth for the ENA submission.

kmexter commented 1 month ago

Discussed with @bulricht: this is now to be treated as a string, however as there is a QC check for this against the tot_depth_water_col in the observatory tab of each logsheet, this will have to be done in a new way: take the string. if it has a - is it, then assume it is formatted as float - float. take the larger of the two floats. This is the what you should check against max depth and is also what needs to get into the ENA XML files.

kmexter commented 1 month ago

@laurianvm if you have finished your part of this issue, can you say so here so I know that it is only Bram's part left to do?

kmexter commented 1 month ago

Hmm. @laurianvm :-} we think it is better that rather than you transforming these depths into a max and min, it is better that it is done to the transformed loghsheets - we create 2 new columns, depth_min and depth_max. That may mean you backpeddling on some template changes? We think that because it is best to do the least amount of data changes in the templates as possible. I wonder tho how I would indicate this in the logsheet_schema_extended, as these will be one column coming in and two going out, with different vocabulary terms (so they have to be in two rows in that file). @bulricht can you advise?

kmexter commented 2 days ago

@kmexter to find new BODC terms for these for logsheets transformed

kmexter commented 2 days ago

@kmexter raise new issue to say how to do this when some entries are a range and some are single values and change how it is in logsheet schema - need to change the way the logsheet schema extended is written as should not be a xsd:string or float but just "xsd:float" or "range" and bram's code has to understand that

I should double check if there are any unbounded ranges, and if so needs to be a range instead (0-5 = < 5 and 5-maxdepth - >5)

And tell the stations tell them how to indicate a range: 3-5.