CCMS-UCSD / GNPS_Workflows

Public Workflows at GNPS
https://gnps.ucsd.edu/
Other
56 stars 44 forks source link

(Re)-calculate exact mass if needed #816

Closed Adafede closed 2 years ago

Adafede commented 2 years ago

Sorry messed up a bit...reopening a PR which should not conflict anymore!

mwang87 commented 2 years ago

Maybe we should have it as a different field, like calcuatled_exactmass to distinguish it from whats in the library?

Adafede commented 2 years ago

Hi @mwang87!

Why would you like to keep what is in the library? What the users fill actually in is anyhow calculated...This will simply allow having consistent fields. I think so at least, I see no way this calculated mass could hurt?

justinjjvanderhooft commented 2 years ago

I would also be careful with changing existing masses given that other pipelines may rely on them, and add a field for the calculated masses....

Adafede commented 2 years ago

@justinjjvanderhooft It is actually the reason that pushed me to open that pull request.

I am part of those downstream pipelines.

There are many erroneous masses in there, so first I cleaned them on my side and then thought well...this is not the right way to do it, we should clean them upstream.

I might miss something important but what are your concerns if we clean, let's say a 162.0in 162.05282? I am of course not talking about the experimental protonated mass here, this should of course not be touched. Or did I interpret exact mass wrongly?

justinjjvanderhooft commented 2 years ago

Let me start with saying that I don't dispute the value of the recalculated exact masses. Regarding the terminology of GNPS, I will leave that up to the GNPS team (Ming), but I think it is best to separate user-defined (exact) masses and computed ones, also because they are likely based on user-defined elemental formulas?! In the matchms workflow, we also incorporated a cleanup of the GNPS library spectra, also adding SMILES etc., perhaps one day we can also integrate that. Again, it may be best to separate computed/computer-derived entries from manual inputed ones there....(both have their own sources of error) - anyways, my 5 cents!

Adafede commented 2 years ago

@justinjjvanderhooft I agree 100%!

Maybe asking for minimal but mandatory user-defined fields and computing all the rest would help harmonize everything? This goes of course beyond this PR 😊

So we are now at 10 cents!

mwang87 commented 2 years ago

So we would want to keep whats in the library because thats what people entered. It just needs to be separate and apart in terms of naming. Whether we show it is a different story, but it needs to be clear what we have here is calculated from the structure.

justinjjvanderhooft commented 2 years ago

Agreed @mwang87 - also because it is good provenance! 😎

Adafede commented 2 years ago

I understand this means it will be included in a broader refactoring, nice 😊

Is this PR then still useful?

@mwang87 If so please advise on wanted modifications