Closed sharpsy closed 2 years ago
@lkrsnik Let's deal with this once you have classla back in your focus.
@sharpsy this might take a few weeks. We, of course, also do not mind pull requests. The underlying issue should be easy to fix.
This was fixed with the latest classla release.
Describe the bug As the title says, the text property of the sentence object is always empty.
To Reproduce
Expected behavior I would expect that both sentence objects contain the original text of the sentence. This makes it harder to implement some workflows. In my case, I would like to cross-reference the information received from classla (ner, lemma, upos) with the output from BERT-like models. Those models use different tokenizers that expect the raw text - and this issue makes it harder to get to the sentence text when the document contains a number of sentences (ie. there is not just a single one as in the example above). (there is a workaround of parsing the output of sentence._metadata but I would like to avoid it if possible)
Environment (please complete the following information):