CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
120 stars 28 forks source link

Issues with creation of new model #39

Closed DianaCH1 closed 1 year ago

DianaCH1 commented 1 year ago

Hello everyone,

I'm confused about the process of creation of new model. I tried to follow your tutorial but I'm ending with only empty list of records. I wanted to extract TOF with some value and unit from a sentence. I have tried the following: first I created a new Unit "Conversion". because it has unit "h-1" I only wrote the following script without units_dict :

class Conversion(Dimension):
    constituent_dimensions = Time() ** (-1)

class ConversionModel(QuantityModel):
    dimensions = Conversion()

after I created a new class TurnOverFrequency in model.py as follows:

class TurnOverFrequency (ConversionModel):
    expression = I('TOF')
    specifier = StringType(parse_expression=expression, required=True, contextual=False, updatable=False)
    compound = ModelType(Compound, required=False, contextual=False)

sadly after applying it on my code with one sentence I only get an empty list:

from chemdataextractor.model import Compound, TurnOverFrequency
from chemdataextractor.doc import Document,Paragraph

doc = Document(
    Paragraph('For 0.6Rh/SiO2, the catalyst productivity was 44 mol/(molRh·h) and the TOF was 2511 h−1 at the time on stream(TOS) of 5 h. '),
    models=[TurnOverFrequency]
    )
doc.records.serialize()

#Output: []

What am I missing? why is it not working? do I need to create a new parser for this? thank you in advance

OBrink commented 1 year ago

Hey @DianaCH1,

I quickly tried reproducing your problem, and it seems like your code is working.

If I run the following code snippet in an environment with ChemDataExtractor 2.1.2...

from chemdataextractor.model import QuantityModel
from chemdataextractor.model.units import Dimension, Time
from chemdataextractor.parse.elements import I
from chemdataextractor.model.model import Compound, StringType
from chemdataextractor.doc import Document,Paragraph

class Conversion(Dimension):
    constituent_dimensions = Time() ** (-1)

class ConversionModel(QuantityModel):
    dimensions = Conversion()

class TurnOverFrequency (ConversionModel):
    expression = I('TOF')
    specifier = StringType(parse_expression=expression, required=True, contextual=False, updatable=False)
    compound = ModelType(Compound, required=False, contextual=False)

doc = Document(
    Paragraph('For 0.6Rh/SiO2, the catalyst productivity was 44 mol/(molRh·h) and the TOF was 2511 h−1 at the time on stream(TOS) of 5 h. '),
    models=[TurnOverFrequency]
    )
doc.records.serialize()

... I get:

[{'TurnOverFrequency': {'raw_value': '2511',
   'raw_units': 'h−1',
   'value': [2511.0],
   'units': 'Hour^(-1.0)',
   'specifier': 'TOF',
   'compound': {'Compound': {'names': ['0.6Rh/SiO2']}}}}]

Are you sure that the TurnOverFrequency class is imported correctly?

DianaCH1 commented 1 year ago

thanks for the help. it worked in the separate jupyter notebook My problem was that I wanted make changes inside the package itself. Probably as you claimed it was all a problem with import. I will let it be and write a separate code with new classes as you proposed