CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
120 stars 28 forks source link

'CemTagger' object has no attribute 'legacy_tag' #21

Open Il2295 opened 2 years ago

Il2295 commented 2 years ago

CemTagger throws the following error.

Screenshot 2022-05-27 at 6 43 06 PM
ti250 commented 2 years ago

tag_sents has been deprecated and hasn't been updated as we weren't using it which means some newer taggers don't work with it, as is mentioned in the documentation. I should have made this more obvious, sorry! For equivalent functionality, please try something along the lines of the following:

from chemdataextractor.nlp.new_cem import BertFinetunedCRFCemTagger, CemTagger
from chemdataextractor.doc import Sentence

ner_tagger = BertFinetunedCRFCemTagger(max_batch_size=100)
CemTagger.taggers[2] = ner_tagger

queries = ["Hybridization state of Xe in XeF2", "Xef4 and XeF6 respectively are"]
sents = [Sentence(query) for query in queries]

for sent in sents:
    sent.taggers.append(ner_tagger)
    print(token.ner_tag for token in sent.tokens)
Il2295 commented 2 years ago

@ti250 yeah I tried it is working fine but when I tried to extract the CEMS from the example given above ( i.e " Hybridization state of Xe in XeF2, XeF4, XeF6 ".) it is only giving "Xe" in output while the same example when I am checking on the online demo ( http://www.chemdataextractor2.org/demo) it is giving "Xe, XeF2, XeF4, XeF6" as output. Can you please let me know what I am doing wrong and how to get the required result?

Screenshot 2022-05-30 at 1 47 16 PM Screenshot 2022-05-30 at 1 48 17 PM

Can we make it case-insensitive to extract the CEMS from the given text?