BlueObelisk / chemicaltagger

ChemicalTagger is a tool for semantic text-mining in chemistry.
Apache License 2.0
36 stars 8 forks source link

how to locate compound in relation to its NMR spectrum among journals? #11

Closed biotech7 closed 1 year ago

biotech7 commented 1 year ago

Hi, chemicaltagger/OSCAR team! this is not a issue but a algorithm discussion. when parsing nmr info,it's hard to programmly decide which compound is related to the parsed nmr. for example: in an experimental section from Organic Letters, there are serveral compounds(Ethyl 2-(((tert-butoxycarbonyl)oxy)(4-methoxyphenyl)methyl)acrylate, petroleum ether, ethyl acetate) in front of NMR data. it's easy to manually decide which is the proper compound in related to the said NMR info. but it's hard to programmly decide which is the proper one. is there any solution/algorithm in OSCAR/chemicaltagger for this relation connection? test

mjw99 commented 1 year ago

I think this is a difficult problem and out of scope; but forks and patches are welcome.

petermr commented 1 year ago

There's. a very de facto convention that the title compound (Ethyl 2-...) along with the identifier (2c) is the single compound. In chemicaltagger we use this approach. I agree it would be better if the report was semantic (it could so easily be - e.g. a table). Another heuristic is that the other compounds ("petroleum ether", "ethyl acetate") have a role as solvents

Lezan Hawizy, I and colleagues did a lot of work in Java parsing this sort of report. I'm happy to talk with anyone who wants to implement it in, say, Python

On Sat, Jan 14, 2023 at 1:22 PM Mark J. Williamson @.***> wrote:

I think this is a difficult problem and out of scope; but forks and patches are welcome.

— Reply to this email directly, view it on GitHub https://github.com/BlueObelisk/chemicaltagger/issues/11#issuecomment-1382737286, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS7T6JA3BXVSTA7DMSLWSKSAXANCNFSM6AAAAAATRSII6M . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

egonw commented 1 year ago

I'm happy to talk with anyone who wants to implement it in, say, Python

Instead of reimplementing it, I would first consider wrapping, like what was done with https://github.com/cthoyt/pybacting/