CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
120 stars 28 forks source link

The parser does not detect the comma #17

Closed tcaceresm closed 2 years ago

tcaceresm commented 2 years ago

Hi! I was creating a parser to detect specific molecules. Everything works very well, except that it does not detect commas. For example: 2,3-dimethylamine If I apply this parser: W('2') + W(',') + W('3') + W('-') + I('dimethylamine).add_action(merge) it doesn't work only because of the comma. Any ideas?

ti250 commented 2 years ago

Try printing your sentence's raw tokens with print(sentence.raw_tokens) and see how it's tokenised. I assume that here the comma is included so that it's tokenised as ["2,3", "-", "dimethylamine"] or something along those lines.

tcaceresm commented 2 years ago

it's tokenised as ["2,3", "-", "dimethylamine"]

yes, you are right! Thanks for your quick response. I rewrote the parser and so far it works fine.