emo-bon / observatory-profile

Repository for the templates and additional metadata, that are used to semantically uplift emo-bon logsheet data into triples
0 stars 1 forks source link

All measured sampling log sheets: fields where values can be lists need to have their format changed from "automatic" to "plain text" #36

Open cymon opened 3 weeks ago

cymon commented 3 weeks ago

Example sheet: https://docs.google.com/spreadsheets/d/1xfrqraPa0auQ1O-C9RUo68RhxrPCDWkVMCAUbj79AZI

For fields that can have multiple listed entries:

e.g. "pigments" example in definitions_updated is "caroten: 0.125;phycoerythrin: 0.121" ie a string Concentration of pigments other than chlorophyll-a; can include multiple pigments separated by ";". Look at the example for how to enter multiple values also in the sheet above there is only a float (0.7537) entered so what pigment is it?

It's throwing an error because the validator correctly expects a string, but the spreadsheet is formatting it as a float (because format is automatic)... it should be throwing an error because the value doesnt say what pigment the value is referring to...

Update

"phaeopigments" in this sheet is another "list" that is being formatted as float: Processing observatory_id='OSD74' - sampling_strategy='water_column' - sheet_type='measured' Sample sheet link:

ValidationError: 1 validation error for Model phaeopigments Input should be a valid string [type=string_type, input_value=1.83, input_type=float] For further information visit https://errors.pydantic.dev/2.8/v/string_type

kmexter commented 3 weeks ago

For the point "also in the sheet above there is only a float (0.7537) entered so what pigment is it?" -> that is an error that @melanthia @melinalou or @isanti have to ask the station to fix -> the station needs to give a type

If there is only one correctly-formatted value there, then for the QC we do here I asked Bram to treat that as a 1-element list. So the QC will do a formatting check on the elements of the list by separating the list on the ";"s -- "string : value ; string : value ; etc" - and if there is just a "string : value" there it will assume this is a list of 1 element. In the triple store this will be recorded as a measurement of type "pigment" with subtypes "string1=value1" and "string2 = value2" etc.

cymon commented 3 weeks ago

Yeah, I think it's important to separate what the validation and QC are doing. Validation is checking types, not correctness of the values given, only that they are the correct type. QC checks for correctness of the values entered, even if the type is correct.

The problem here is that the recorders were entering floats, which because the format of the field is set to automatic, means that the sheet exports them as floats rather than the strings they should be. If the formatting was changed to plain_text, even if they entered a floating point number it would be exported as a string. So it would pass validation (it is correctly a string) but fail QC, because the string did not follow the expected format "pigment: value;" (or whatever it is...)

kmexter commented 3 weeks ago

That is something only @cpavloud or @isanti has permission to do, I think -> or @melanthia / @melinalou can you do it?

cpavloud commented 2 weeks ago

Added in my To-Do-List (along with all the other issues that need solving). FYI, I have tried to compile all of them in the end of this document

melinalou commented 2 weeks ago

Sorry for the delay answer. So we need to make all the floats formatted as plain text? or they are specific fields (such as phaeopigments)? (I have permission only in the "new sheets" I've made, so I think it is ok to do it for sampling and measured)

kmexter commented 2 weeks ago

I think that is what Cymon wants, yes. For us at VLIZ it makes no difference (@bulricht can you confirm this?)