emo-bon / observatory-emt21-crate

EMO BON observatory - logsheets
0 stars 0 forks source link

Ammonium #6

Open cpavloud opened 1 month ago

cpavloud commented 1 month ago

Values in cells need to be numbers only. So > signs are not accepted.

@melinalou

melinalou commented 1 month ago

done

cpavloud commented 1 month ago

@melanthia @kmexter Make a note about this, it should be somehow included in the definitions in the logsheets. Also, maybe we should find a way to store this information, since it is not an allowed value.

kmexter commented 1 month ago

yes, we had a discusion about this during the emobon sessions in crete, but the conclusion was not clear. I think we concluded that it is useful to have this information, where it exists; I pointed out that the consequence of this is that all parameters for all logsheets would then be stored as a string, not a number, and the conversation stopped there. We do need a decision on this: at present, those events which have a > or < or - in the cell, will not be turned into triples and will not be part of the dataset that gets read by BlueCloud or FAIR EASE. What does this mean? Simply that you will not be able to do any mathematical style work on these parameters for all emo bon data - e.g. you will not be able to search in the FAIR EASE (EMO BON) database to look for "this parameter is between 0 and 10 in value". If that is acceptable, then good. If not, then we need to make some changes.

My feeling is that for now, we should treat all parameters as strings, because otherwise we lose material samples from the machine-accessible version of the data we create.

Opinions?

cpavloud commented 1 month ago

Having the parameter as a string is not an option for ENA. So, I believe we need to work somehow similar as we did for the tidal stage.

We can keep the "ammonium" column containing only numbers, so it will contain only "0.13" in this case. And we can create a new column, e.g. named "ammonium detailed information", to store the actual measurement, e.g. ">0.13" Of course, this would mean that there might be other future cases for other parameters that would require the same approach.

kmexter commented 1 month ago

Yeh, but we cannot do this on a case-by-case basis. We have to do it exactly the same way for all stations and all parameters. At least, for the programmatic part that we will take over for batch 3 onwards - unless you are willing to always do it manually 😄.

But you are saying that ENA will not accept > in its value? I had not considered that. The idea was that our checklists are exactly the same for all samples, so no special entries for one sample that another sample does not have: if we have one field "ammonium detailed information" we need to have that field for all samples and to do the same for all parameters Which is technically doable, albeit will be annoying for you to add that to the >100 samples you already filled in.

Urm, argh. I am stumped. It is true that >0.13 is not the same as 0.13, and we are lying if we say 0.13 instead of >0.13. Choices:

for both choices: treat all parameters in the emo bon logsheets as strings. Perhaps later we can find some way to translate that into a useable number.

cpavloud commented 1 month ago

Yes, ENA will not accept > or < in its value.

I'm not sure which is the best option. Probably the most inclusive will be the second one, but if we do it for everything, we will end up with a huge (and practically half empty) .csv file

kmexter commented 1 month ago

argh, we need a solution, because this will hold things back I had a chat with Laurian. We think it is possible to identify vlaues that have a > < - in them and 1) identify these as min, max, or ranges in the turtle files we create (annoying extra work, but doable) 2) for these, create an If then in the ENA code that will create a "detailed info" entry and add the entire string to that. How does that sound? What we do need to know additionall is whether for point 2 we ALSO fill in the standard checklist value, as well as our "additional info", or not? If that parameter happens to be mandatory, then we have to of course. Are any of these mandatory? this would then apply to all values in the measured tab and also the size_frac_up and low from the sampling tab.

cpavloud commented 1 month ago

I think your solution sounds great.

Parameters such as "ammonium" are optional, so we don't really need to fill in the standard checklist value. There are extremely few mandatory columns in the checklists.

The only ENA mandatory fields are: project name (for water and sediment) collection date (for water and sediment) geographic location (country and/or sea) (for water and sediment) geographic location (latitude) (for water and sediment) geographic location (longitude) (for water and sediment)
broad-scale environmental context (for water and sediment)
local environmental context (for water and sediment)
environmental medium (for water and sediment)
elevation (for sediment)
depth (for water and sediment)

There is no issue with the size_frac_up and size_frac_low and > or <. Basically in this case, the providers should not use > and <, there is no need for this. The ammonium case is different, because it denotes an instrument or method limit. With the filters, we know what is the lower one and what is the upper one. In cases like "EMO BON ROSKOGO Wa 230414 200µm_1" size_frac_low should just be "200" and size_frac_up should be "NA" and that's it. Where there size_frac_low ">200" it's just wrong.