Closed isanti closed 3 weeks ago
Adding a comment, because I believe this issue really needs to be resolved.
The GSC definitions in the MiXS website for size_frac_low and size_frac_up are reversed, they are not logical, as @isanti also mentioned when opening the issue.
However, the definitions in the ENA checklists are correct: size-fraction lower threshold: Refers to the mesh/pore size used to retain the sample. Materials smaller than the size threshold are excluded from the sample size-fraction upper threshold: Refers to the mesh/pore size used to pre-filter/pre-sort the sample. Materials larger than the size threshold are excluded from the sample.
And this causes a discrepancy.
Before these changes are made, let's contact the original group that worked on this. I would suggest Frank Oliver Glockner, Renzo Kottmann. Linda Amaral Zettler.
We have to be cautious about editing these fields. We can misconstrue what was meant when the fields were created.
We should also look into how the field has been populated.
As of February, there were 41,147 Biosamples (out of 37,572,120) where either size_frac
, size_frac_low
or size_frac_up
was not null (from an SQL perspective). I asked the Gemini LLM to summarize it for me, that that error-ed out. I can share the full file or I can work on summarizastion over the next day or two.
@lschriml are you concerned that there's just a misunderstanding of how to use the slots, and not a mistake in the descriptions? I did recently submit a PR to make the change. Maybe we should discuss at CIG
I want to check that we document why things are being changed, so that we can trace it back, if asked.And to determine, when/how it got changed before. Sent from my iPhoneOn Jul 19, 2024, at 6:12 PM, Montana @.***> wrote: @lschriml are you concerned that there's just a misunderstanding of how to use the slots, and not a mistake in the descriptions? I did recently submit a PR to make the change. Maybe we should discuss at CIG
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
It may also be useful to be clear the vocabulary used for LinkML and the equivalent terms used in MIxS, so that is transparent to the world what we are discussing, and to avoid any confusion.Sent from my iPhoneOn Jul 19, 2024, at 7:46 PM, Lynn Schriml @.> wrote:I want to check that we document why things are being changed, so that we can trace it back, if asked.And to determine, when/how it got changed before. Sent from my iPhoneOn Jul 19, 2024, at 6:12 PM, Montana @.> wrote: @lschriml are you concerned that there's just a misunderstanding of how to use the slots, and not a mistake in the descriptions? I did recently submit a PR to make the change. Maybe we should discuss at CIG
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
In case it is helpful, I noticed some things:
-- In the original MIMARKS publication, in the Supplementary Results 2, there are no terms for size_frac_low and size_frac_up in the water checklist -- If I am not mistaken, the terms first appeared in mixs5 -- size_frac_low and size_frac_up are also included in the FoodFarmEnvironment and in the Agriculture checklists -- NCBI has kept the original MIxS definitions (which creates even further discrepancies since ENA has reversed the definitions)
Brilliant, thanks for moving this on Montana.
Good, yes worth discussing at the CIG today. Good point Lynn, yes the original group may have had a different train of thought in their heads. Has anyone asked them? Shall I? (I don't know them though) Agree, such changes need to be tracked. At ENA we now have a change log, but it is does not always catch the why.
Our(ENA's) marine data expert (Stephane Pesant) was the one who originally flagged the size fraction discrepancy. It is so valuable having people in the office who are actively preparing and submitting samples via different checklists like Stephane, and also others who are on the frontline dealing with helpdesk queries. This came up when we consolidating older terms in the "ENA Tara Oceans" checklist with conceptually similar terms in MIxS6.2. (you are right Christina those size fraction terms were in MIxS 5.0.) We still have much technical metadata debt in ENA, on top of that in GSC - collectively important to increase FAIRness.
Christina, you have me thinking further now what to do with similar in the future, where a definition change is major such as this case about the mesh/pore size. As you indicate it then makes it inconsistent across INSDC and others using the GSC MIxS, until others have made the change too. I am going to raise this as a minor discussion point at our weekly internal ENA content meeting later this morning. My natural and our ENA group's tendency is to fix what are considered errors and move on, but yes we may have misunderstood what the original authors meant.
@Woolly-at-EBI I think Stephane was absolutely right flagging the discrepancy and I think ENA is also absolutely right in having switched the definitions. Because this is what makes sense (from the scientist perspective).
If you check the examples provided for these terms, you will see that for size_frac_low, the example value is 0.2 micrometer and for size_frac_up it is 20 micrometer. Which is perfectly fine, as it should be, and makes me think that something like a wrong copy-pasting was the reason why the definitions are reversed.
Just to comment (again) that NMDC also uses the (wrong) definitions for size_frac_low and size_frac_up.
The water package was one of the first ones we created. I would suggest checking with Pelin Yilmaz, as she led these efforts. @pyilmaz (pyilmaz.mgx@gmail.com) It looks like whatever we had for water was then copied to the other packages.
I will also forward this to our GSC board members.
For documentation, let's see if we can find these definitions in publications, marine sites.
[MIxSv6_release.xlsx](https://github.com/user-attachments/files/16379132/MIxSv6_release.xlsx)
I was curious how these terms were listed in MIxS 6 release (attached):
-- the terms are in 3 packages (food-farm environment, water, agriculture): size_frac_low. (size-fraction lower threshold) definition: Refers to the mesh/pore size used to pre-filter/pre-sort the sample. Materials larger than the size threshold are excluded from the sample Example: 0.2 micrometer
size_frac_up (size-fraction upper threshold) definition: Refers to the mesh/pore size used to retain the sample. Materials smaller than the size threshold are excluded from the sample Example: 20 micrometer
Cheers, Lynn
Who should contact Pelin Yilmaz?
I have emailed Pelin and the board.
On Thu, Jul 25, 2024 at 11:05 AM Christina Pavloudi < @.***> wrote:
Who should contact Pelin Yilmaz?
— Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/566#issuecomment-2250612805, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBB4DK7MIKILYSM32ZVCRLZOEH2XAVCNFSM6AAAAAAYSVTVKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGYYTEOBQGU . You are receiving this because you were mentioned.Message ID: @.***>
-- Lynn M. Schriml, Ph.D. Associate Professor
Institute for Genome Sciences University of Maryland School of Medicine Department of Epidemiology and Public Health 670 W. Baltimore St., HSFIII, Room 3061 Baltimore, MD 21201 P: 410-706-6776 | F: 410-706-6756 @.***
as a current board member, my vote goes with the need for a correction to be made in the GSC definitions of those two terms. I think the clincher is the fact the example values included show the intent. If you use those example of size_frac_low=0.2 and size_frac_up=20, then it is obvious you want to exlude particles outside that range. But I also think the names and definitions could be made clearer somehow.
@cpavloud I tried to tag you in a comment in my PR, but your username wasn't showing up. I'll try again. But once we hear back from Pelin & get a board approval, we'll be able to merge in this PR and it'll be part of the next release.
@only1chunts @lschriml Any word from Pelin or the board?
Not yetSent from my iPhoneOn Aug 13, 2024, at 4:58 PM, Montana @.***> wrote: @only1chunts @lschriml Any word from Pelin or the board?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
OK, we're approaching a month since we started this discussion. At what point do we consider the new community feedback, that these term descriptions are backwards and confusing the current priority and make the change?
Considering the "living standard" development of MIxS, it doesn't seem invalid to take new feed back and make improvements even if they were not the original knowledge. We do need to make sure that there's VERY good documentation and clarification on the change. As for the INSDC implementation, considering @Woolly-at-EBI was one of the people that submitted the issues about it being backwards, I'd expect we can trust the individual implementations of GSC to manage the update as well.
For what it's worth, I just looked at v5 and v4. v4 doesn't have these terms & v5 has them incorrect as well.
Let's wrap this up at the next CIG.
On Tue, Aug 13, 2024 at 5:26 PM Montana @.***> wrote:
For what it's worth, I just looked at v5 and v4. v4 doesn't have these terms & v5 has them incorrect as well.
— Reply to this email directly, view it on GitHub https://github.com/GenomicsStandardsConsortium/mixs/issues/566#issuecomment-2287165911, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBB4DKHB3ZN53RMSILBCWTZRJ2YRAVCNFSM6AAAAAAYSVTVKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBXGE3DKOJRGE . You are receiving this because you were mentioned.Message ID: @.***>
-- Lynn M. Schriml, Ph.D. Associate Professor
Institute for Genome Sciences University of Maryland School of Medicine Department of Epidemiology and Public Health 670 W. Baltimore St., HSFIII, Room 3061 Baltimore, MD 21201 P: 410-706-6776 | F: 410-706-6756 @.***
Move forward with the change. Capture notes and comments about why and provide VERY clear and well described notes.
Current term details Please supply the current details of the term that you would like to update:
Suggested update(s) Please supply the new suggestions for any of the details listed below (only insert text to those details that should be updated):