Open cymon opened 3 weeks ago
For the point "also in the sheet above there is only a float (0.7537) entered so what pigment is it?" -> that is an error that @melanthia @melinalou or @isanti have to ask the station to fix -> the station needs to give a type
If there is only one correctly-formatted value there, then for the QC we do here I asked Bram to treat that as a 1-element list. So the QC will do a formatting check on the elements of the list by separating the list on the ";"s -- "string : value ; string : value ; etc" - and if there is just a "string : value" there it will assume this is a list of 1 element. In the triple store this will be recorded as a measurement of type "pigment" with subtypes "string1=value1" and "string2 = value2" etc.
Yeah, I think it's important to separate what the validation and QC are doing. Validation is checking types, not correctness of the values given, only that they are the correct type. QC checks for correctness of the values entered, even if the type is correct.
The problem here is that the recorders were entering floats, which because the format of the field is set to automatic, means that the sheet exports them as floats rather than the strings they should be. If the formatting was changed to plain_text, even if they entered a floating point number it would be exported as a string. So it would pass validation (it is correctly a string) but fail QC, because the string did not follow the expected format "pigment: value;" (or whatever it is...)
That is something only @cpavloud or @isanti has permission to do, I think -> or @melanthia / @melinalou can you do it?
Added in my To-Do-List (along with all the other issues that need solving). FYI, I have tried to compile all of them in the end of this document
Sorry for the delay answer. So we need to make all the floats formatted as plain text? or they are specific fields (such as phaeopigments)? (I have permission only in the "new sheets" I've made, so I think it is ok to do it for sampling and measured)
I think that is what Cymon wants, yes. For us at VLIZ it makes no difference (@bulricht can you confirm this?)
Example sheet: https://docs.google.com/spreadsheets/d/1xfrqraPa0auQ1O-C9RUo68RhxrPCDWkVMCAUbj79AZI
For fields that can have multiple listed entries:
e.g. "pigments" example in definitions_updated is "caroten: 0.125;phycoerythrin: 0.121" ie a string Concentration of pigments other than chlorophyll-a; can include multiple pigments separated by ";". Look at the example for how to enter multiple values also in the sheet above there is only a float (0.7537) entered so what pigment is it?
It's throwing an error because the validator correctly expects a string, but the spreadsheet is formatting it as a float (because format is automatic)... it should be throwing an error because the value doesnt say what pigment the value is referring to...
Update
"phaeopigments" in this sheet is another "list" that is being formatted as float: Processing observatory_id='OSD74' - sampling_strategy='water_column' - sheet_type='measured' Sample sheet link:
ValidationError: 1 validation error for Model phaeopigments Input should be a valid string [type=string_type, input_value=1.83, input_type=float] For further information visit https://errors.pydantic.dev/2.8/v/string_type