-
Spawned off of #513 and UniversalDependencies/UD_English#40.
I proposed:
> Another issue relevant here is abbreviations. For uncommon abbreviations/shortened forms (like *w* for *with*, *btwn* f…
-
- [ ] Preparar los datos (anotados y no anotados) en formato CONLLU
- [ ] Implementar POS-tagger y entrenar
- [ ] Anotar automáticamente los POS-tags
- [ ] Implementar Dependency-parser y entrenar
…
-
I’m getting an “Check_input” error when I try to run PrepText using the sotu example, even though the text is type character. I tried creating my own tf-idf data frame using tidy text so I could still…
-
The UDPipe sentence splitter seems to be a bit too split-happy, creating many fragments. Is this dragging down performance of our BERT models? Furthermore, we put a lot of effort into splitting large …
-
- noun phrases
- verb phrases
- subject-verb-objects
- adjectives-nouns
- collapse entities
-
To be fair to the excellent developers at [spaCy](http://spacy.io) you might differentiate between our implementation of their return objects (which come from spaCy in Python lists) and our R objects,…
-
speed up a bit when using strsplit by setting fixed = TRUE where applicable or passing the argument of strsplit on
-
-
### Common taxonomies are modified
- [ ] common taxonomies
The common taxonomies should be used without modifications - just translations.
E.g. in your `parla.legislature` taxonomy, you don't hav…
-
in that case paragraph_id is within chunks of doc_ids across the cores