Open hepcat72 opened 2 months ago
You could use the parsing functions, like formula_to_composition
to
return the composition dict. Its keys are the atomic number, so you could
use the hydrogen key to compare the number of hydrogens.
On Thu, Aug 22, 2024 at 11:52 AM Robert Leach @.***> wrote:
I was wondering if you might be able to provide any insights to my following problem.
We receive chemical compound formulae from some Mass Spec software (Maven or El Maven). There's always a compound name that accompanies the formula, but that isn't always consistent (since compounds can have many synonyms). When we don't have a matching compound name/synonym, we have to determine if the name provided is a synonym of an existing compound in our database or if we need to add a new compound to our database.
There are of course multiple ways to accomplish this, but one helper method I added recently was to present the researcher with a list of possible compound matches. I naïvely did this by matching all existing compounds in our database with the same formula. And we immediately encountered the fact that this can miss existing entries because the formula from the mass spec data can represent the ionized version of the compound (missing or with an extra H).
My subsequent (naïve) thought was to expand the search to include matches that differ by some threshold of hydrogens. You might be able to provide better suggestions for this strategy, but if that DOES sound reasonable, is there an existing method in your package that can compare formulas or take the difference of 2 formulas, e.g. C19H37NO5 - C19H35NO5 = H2?
— Reply to this email directly, view it on GitHub https://github.com/bjodah/chempy/issues/235, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOQCHSYLXLA54BKLWJ6WX7LZSYJLBAVCNFSM6AAAAABM6RJIP6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DCMRZGM3TCNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Jeremy A. Gray
I was wondering if you might be able to provide any insights to my following problem.
We receive chemical compound formulae from some Mass Spec software (
Maven
orEl Maven
). There's always a compound name that accompanies the formula, but that isn't always consistent (since compounds can have many synonyms). When we don't have a matching compound name/synonym, we have to determine if the name provided is a synonym of an existing compound in our database or if we need to add a new compound to our database.There are of course multiple ways to accomplish this, but one helper method I added recently was to present the researcher with a list of possible compound matches. I naïvely did this by matching all existing compounds in our database with the same formula. And we immediately encountered the fact that this can miss existing entries because the formula from the mass spec data can represent the ionized version of the compound (missing or with an extra
H
).My subsequent (naïve) thought was to expand the search to include matches that differ by some threshold of hydrogens. You might be able to provide better suggestions for this strategy, but if that DOES sound reasonable, is there an existing method in your package that can compare formulas or take the difference of 2 formulas, e.g.
C19H37NO5 - C19H35NO5 = H2
?