Closed Adafede closed 2 years ago
Maybe we should have it as a different field, like calcuatled_exactmass to distinguish it from whats in the library?
Hi @mwang87!
Why would you like to keep what is in the library? What the users fill actually in is anyhow calculated...This will simply allow having consistent fields. I think so at least, I see no way this calculated mass could hurt?
I would also be careful with changing existing masses given that other pipelines may rely on them, and add a field for the calculated masses....
@justinjjvanderhooft It is actually the reason that pushed me to open that pull request.
I am part of those downstream pipelines.
There are many erroneous masses in there, so first I cleaned them on my side and then thought well...this is not the right way to do it, we should clean them upstream.
I might miss something important but what are your concerns if we clean, let's say a 162.0
in 162.05282
? I am of course not talking about the experimental protonated mass
here, this should of course not be touched. Or did I interpret exact mass
wrongly?
Let me start with saying that I don't dispute the value of the recalculated exact masses. Regarding the terminology of GNPS, I will leave that up to the GNPS team (Ming), but I think it is best to separate user-defined (exact) masses and computed ones, also because they are likely based on user-defined elemental formulas?! In the matchms workflow, we also incorporated a cleanup of the GNPS library spectra, also adding SMILES etc., perhaps one day we can also integrate that. Again, it may be best to separate computed/computer-derived entries from manual inputed ones there....(both have their own sources of error) - anyways, my 5 cents!
@justinjjvanderhooft I agree 100%!
Maybe asking for minimal but mandatory user-defined fields and computing all the rest would help harmonize everything? This goes of course beyond this PR 😊
So we are now at 10 cents!
So we would want to keep whats in the library because thats what people entered. It just needs to be separate and apart in terms of naming. Whether we show it is a different story, but it needs to be clear what we have here is calculated from the structure.
Agreed @mwang87 - also because it is good provenance! 😎
I understand this means it will be included in a broader refactoring, nice 😊
Is this PR then still useful?
@mwang87 If so please advise on wanted modifications
Sorry messed up a bit...reopening a PR which should not conflict anymore!