CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
120 stars 28 forks source link

Fix broken NmrParser #38

Closed OBrink closed 8 months ago

OBrink commented 1 year ago

As discussed with @maoliyun and @ViktorWeissenborn in #25, the NMR parser was not working.

When running from chemdataextractor.doc import Document, Paragraph from chemdataextractor.model import NmrSpectrum

doc = Document(Paragraph("1H NMR (CDCl3 with 0.05% v/v TMS, 400 MHz): "
            "δH 7.10 (2H, d, J = 8.9 Hz, H2′ and H6′), "
            "7.03-7.07 (3H, m, H3′′, H4′′ and H5′′), "
            "6.83-6.85 (2H, m, H2′′ and H6′′), 6.66 (2H, d, J = 8.9 Hz, H3′ and H5′), "
            "6.42 (1H, d, J = 1.8 Hz, H5), 6.26 (1H, d, J = 1.7 Hz, H7), "
            "5.18 (1H, s, H1′′′), 5.01 (1H, d, J = 6.6 Hz, H1), "
            "4.52 (1H, s, H2′′′), 4.27 (1H, d, J = 14.2 Hz, H3), "
            "4.15 (1H, br d, J = 11.2 Hz, H4′′′), "
            "4.05 (1H, t, J = 11.2 Hz, H3b′′′), "
            "3.88 (1H, J = 14.3, 6.8 Hz, H2), "
            "3.86 (3H, s, OCH38), 3.69 (3H, s, OCH34′), "
            "3.64 (3H, s, COOCH32), 3.49 (3H, br s, H5′′′ and H6′′′), "
            "3.43-3.47 (1H, overlapped, H3a′′′), 3.45 (3H, s, OCH32′′′)."), models = [Compound, NmrSpectrum])
doc.records.serialize()

... the output did not include the NMR data:

[{'Compound': {'names': ['CDCl3']}},
 {'Compound': {'names': ['TMS']}},
 {'Compound': {'names': ['H2 ′']}},
 {'Compound': {'names': ['H6']}},
 {'Compound': {'names': ['3H']}},
 {'Compound': {'names': ['H3']}},
 {'Compound': {'names': ['H4 ′']}},
 {'Compound': {'names': ['H5']}},
 {'Compound': {'names': ['H1']}},
 {'Compound': {'names': ['H2']}},
 {'Compound': {'names': ['OCH38']}},
 {'Compound': {'names': ['OCH34']}},
 {'Compound': {'names': ['COOCH32']}},
 {'Compound': {'names': ['OCH32']}},
 {'Compound': {'labels': ['v']}},
 {'Compound': {'labels': ['3H']}}]

I had to make two small changes in chemdataextractor/parse/nmr.py in order to make it work again.

Now, the code above returns the expected output that contains the NMR data:

[{'Compound': {'names': ['CDCl3']}}, {'Compound': {'names': ['TMS']}}, {'Compound': {'names': ['H2 ′']}}, {'Compound': {'names': ['H6']}}, {'Compound': {'names': ['3H']}}, {'Compound': {'names': ['H3']}}, {'Compound': {'names': ['H4 ′']}}, {'Compound': {'names': ['H5']}}, {'Compound': {'names': ['H1']}}, {'Compound': {'names': ['H2']}}, {'Compound': {'names': ['OCH38']}}, {'Compound': {'names': ['OCH34']}}, {'Compound': {'names': ['COOCH32']}}, {'Compound': {'names': ['OCH32']}}, {'Compound': {'labels': ['v']}}, {'Compound': {'labels': ['3H']}}, {'NmrSpectrum': {'nucleus': '1H', 'solvent': 'CDCl3 with 0.05 % v/v TMS', 'frequency': '400', 'frequency_units': 'MHz', 'peaks': [{'NmrPeak': {'shift': '7.10', 'multiplicity': 'd', 'coupling': '8.9', 'coupling_units': 'Hz', 'number': '2H', 'assignment': 'H2 ′'}}, {'NmrPeak': {'shift': '7.03-7.07', 'multiplicity': 'm', 'number': '3H', 'assignment': 'H3 ′ ′'}}, {'NmrPeak': {'shift': '6.83-6.85', 'multiplicity': 'm', 'number': '2H', 'assignment': 'H2 ′ ′'}}, {'NmrPeak': {'shift': '6.66', 'multiplicity': 'd', 'coupling': '8.9', 'coupling_units': 'Hz', 'number': '2H', 'assignment': 'H3 ′'}}, {'NmrPeak': {'shift': '6.42', 'multiplicity': 'd', 'coupling': '1.8', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H5'}}, {'NmrPeak': {'shift': '6.26', 'multiplicity': 'd', 'coupling': '1.7', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H7'}}, {'NmrPeak': {'shift': '5.18', 'multiplicity': 's', 'number': '1H', 'assignment': 'H1 ′ ′ ′'}}, {'NmrPeak': {'shift': '5.01', 'multiplicity': 'd', 'coupling': '6.6', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H1'}}, {'NmrPeak': {'shift': '4.52', 'multiplicity': 's', 'number': '1H', 'assignment': 'H2 ′ ′ ′'}}, {'NmrPeak': {'shift': '4.27', 'multiplicity': 'd', 'coupling': '14.2', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H3'}}, {'NmrPeak': {'shift': '4.15', 'multiplicity': 'br d', 'coupling': '11.2', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H4 ′ ′ ′'}}, {'NmrPeak': {'shift': '4.05', 'multiplicity': 't', 'coupling': '11.2', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H3b ′ ′ ′'}}, {'NmrPeak': {'shift': '3.88', 'coupling': '14.3 , 6.8', 'coupling_units': 'Hz', 'number': '1H', 'assignment': 'H2'}}, {'NmrPeak': {'shift': '3.86', 'multiplicity': 's', 'number': '3H', 'assignment': 'OCH38'}}, {'NmrPeak': {'shift': '3.69', 'multiplicity': 's', 'number': '3H', 'assignment': 'OCH34 ′'}}, {'NmrPeak': {'shift': '3.64', 'multiplicity': 's', 'number': '3H', 'assignment': 'COOCH32'}}, {'NmrPeak': {'shift': '3.49', 'multiplicity': 'br s', 'number': '3H', 'assignment': 'H5 ′ ′ ′'}}, {'NmrPeak': {'shift': '3.43-3.47', 'number': '1H', 'assignment': 'H3a ′ ′ ′'}}, {'NmrPeak': {'shift': '3.45', 'multiplicity': 's', 'number': '3H', 'assignment': 'OCH32 ′ ′ ′'}}], 'compound': {'Compound': {}}}}]

Dingyun-Huang commented 8 months ago

Thank you for bringing this up. Looks good to me!