Open Il2295 opened 2 years ago
tag_sents
has been deprecated and hasn't been updated as we weren't using it which means some newer taggers don't work with it, as is mentioned in the documentation. I should have made this more obvious, sorry! For equivalent functionality, please try something along the lines of the following:
from chemdataextractor.nlp.new_cem import BertFinetunedCRFCemTagger, CemTagger
from chemdataextractor.doc import Sentence
ner_tagger = BertFinetunedCRFCemTagger(max_batch_size=100)
CemTagger.taggers[2] = ner_tagger
queries = ["Hybridization state of Xe in XeF2", "Xef4 and XeF6 respectively are"]
sents = [Sentence(query) for query in queries]
for sent in sents:
sent.taggers.append(ner_tagger)
print(token.ner_tag for token in sent.tokens)
@ti250 yeah I tried it is working fine but when I tried to extract the CEMS from the example given above ( i.e " Hybridization state of Xe in XeF2, XeF4, XeF6 ".) it is only giving "Xe" in output while the same example when I am checking on the online demo ( http://www.chemdataextractor2.org/demo) it is giving "Xe, XeF2, XeF4, XeF6" as output. Can you please let me know what I am doing wrong and how to get the required result?
Can we make it case-insensitive to extract the CEMS from the given text?
CemTagger throws the following error.