Open hechth opened 3 years ago
The computation of theoretical isotopic pattern fails when computing the pattern of ions (with formula literally M+ or M-). rcdk
's get.formula
can't initialize a molecule object from such a string. Fixing this should not be a problem as rcdk
computes the pattern from the isotopic occurrences of individual atoms, so just stripping the formulae of {+,-} symbols will likely do the job.
The more significant issue here is how to deal with naturally occurring ions during simple annotation. As discussed with @hechth, if we pass [M+] to the simple annotation as a possible adduct, it will increase the computational time by some factor. match_by_mass
function will end up doing one more iteration of matching every peak to each database compound, which is unnecessary because only a small subset of compounds naturally occur in the form of ions.
The possible solution would be to add an additional step to the simple annotation, which will only compare measured peaks to ions from the compound database.
Also, treating all compounds as possible [M+] options will introduce a lot of false positives.
@ElliottJP what's your hint in this? How often do we observe [M+] ions of compounds that are not already ions before being ionised in the MS?
@maximskorik Is this handled or fixed by now? I can't remember whether we already addressed this or not.
It's not fixed yet as we did not decide how to deal with natural ions during the simple annotation. The temporary way of fixing it is to remove the "C5H13ClN+" entry from the compound table since that's the only charged one in there. Also stripping +/- signs from the formulae during the isotope pattern computing should fix it.
See the galaxy output below.
@maximskorik this seems to be happening when using the HMDB database with the added QC compounds.