emo-bon / observatory-esc68n-crate

EMO BON observatory - logsheets
0 stars 1 forks source link

Wrong size fractions #5

Closed cpavloud closed 2 months ago

cpavloud commented 3 months ago

The size_frac_low and size_frac_up should be the other way round (in the water_sampling.csv). When size_frac is 3-200 then size_frac_low should be 3 and size_frac_up should be 200. Also, size_frac is 0.2-3 then size_frac_low should be 0.2 and size_frac_up should be 3. Now it's the other way round.

@melinalou

melinalou commented 2 months ago

corrected

cpavloud commented 2 months ago

Please confirm if this change has taken place also in the google spreadsheets. @kmexter @melanthia this should be also changed in the examples in the template file.

kmexter commented 2 months ago

I remember having a discussion with Ioulia about this - the definitions of these terms in the Mixs checklists was very confusing From the GCS website I get the following definitions

size-fraction lower threshold (size_frac_low) | Refers to the mesh/pore size used to pre-filter/pre-sort the sample. Materials larger than the size threshold are excluded from the sample

size-fraction upper threshold (size_frac_up) | Mesh or pore size of the device used to retain the sample. Materials smaller than the size threshold are excluded from the sample

So..working this out in my head...this means size_frac_low is the lower pore size and - this is the confusing bit I think - anything larger than this IS NOT included in the sample. size_frac_up is that the later pore size and those smaller than this are excluded from the sample. You guys know more than I about this - given that all observatories that I looked at just now have also a larger number in low than in up, I think if we change this we (1) will need to do it for all and (2) tell everyone specifically that we have changed this.

cpavloud commented 2 months ago

The GSC definitions in the MiXS website are as you say, but they are not logical, this doesn't make sense.

However, the definitions in the ENA checklists are correct (in my humble scientific opinion as a logical human being). size-fraction lower threshold: Refers to the mesh/pore size used to retain the sample. Materials smaller than the size threshold are excluded from the sample size-fraction upper threshold: Refers to the mesh/pore size used to pre-filter/pre-sort the sample. Materials larger than the size threshold are excluded from the sample.

So, I guess there is a discrepancy but (ultimately) it's the ENA definitions and terms we are using, so...

kmexter commented 2 months ago

Indeed they do not ioulia raise an issue on this but she didn't get a reply yet. https://github.com/GenomicsStandardsConsortium/mixs/issues/566 there are some comments in there...maybe you can check them out before we make a decision?

cpavloud commented 2 months ago

There are no actual comments in that issue. I added my part to it.

The MIxS definition is wrong. The way I have done the samples submission is based on logic and on the ENA definitions.

kmexter commented 2 months ago

OK. Anyway, I leave it to you to decide what we should do to the logsheets - just tell me what is decided and I can handle any updates to the logsheets_schema that controls the making-our-data-machine-understandable and also the QC rules (there is a check done on internal consistency of these values, which I will have to change if we change the columns). And if we follow ENA definitions, can you tell me the URL where those are (because I thought they were all just the Mixs) because I need then to change the URL we are using for those terms (as we link to the MIXS URL at present)

cpavloud commented 2 months ago

This is the ENA water checklist https://www.ebi.ac.uk/ena/browser/view/ERC000024 (and this the ENA sediment checklist https://www.ebi.ac.uk/ena/browser/view/ERC000021)

The definitions in our logsheets should follow those of ENA.

kmexter commented 2 months ago

OK, so there are no URLs for those terms, just for the entire checklist. In that case, we will create our own definition, which will be that taken from ENA. I will make the necessary updates on the QC workflow and our ontology.

This means that someone has to change the definition in the "description" sheet of all logsheets, swap the size_frac_low and size_frac_up columns, and swap around the values in the size_frac column (which for some reason is only in the water logsheets, not also sediment)?

cpavloud commented 2 months ago

It doesn't make sense to have these terms in the sediment checklists, this is why they are not there.

melinalou commented 2 months ago

Please confirm if this change has taken place also in the google spreadsheets. @kmexter @melanthia this should be also changed in the examples in the template file.

confirmed

kmexter commented 2 months ago

I am afraid it is not enough to just change the googlesheets. Because (1) ALL googlesheets need changing, otherwise the QC will report errors for all other stations, (2) when ESC adds new values to the googlesheet, they may wonder why these columns have changed or continue using the previous method, and then we have more QC errors So please, if we swap these around (1) change them in all water and sediment logsheets (they are in both of the, it is only the one columm size_frac that is not also in sediment) (2) change the definition and example in the definitions tab everywhere also (3) email all emo bon stations to tell them of this change, and at the same time tell them of the change we have decided to make for pigments and phaeopigments. EMO BON HQ should send this email.

cpavloud commented 2 months ago

There are several changes happening, there will probably be an e-mail to everyone mentioning them (@melanthia) However, I don't think that for the size fractions there will be any issue. What we correct is what makes sense. When they will go to add their values for the next sampling events, they will see what is included in the cells above and act accordingly. I don't think they will think twice about it.

kmexter commented 2 months ago

That is up to you - I will change the rules in the QC BUT as I said, the same changes made now need to be made to all the logsheets, otherwise only ESC will pass the QC (when we get the QC action working again)

cpavloud commented 2 months ago

Yes, of course. I had added issues about this in all the repositories and @melinalou has been taking care of them. So I believe all of them should have changed now.