-
https://hdl.handle.net/20.500.12185/297@format=cmdi
- [x] Missing input and output info
-
https://hdl.handle.net/20.500.12185/354
- [x] Missing input and output info
-
I'm filtering some commonly used words out of a corpus with the Tokenise processor and it only seems to be partially successful. For example in one month there are 37,325 instances of one word. When I…
-
I was playing around with one of the example sentences I found, maybe in docs or one of the the tutorials, trying to find the decliner string associated with a particular word:
```
from cltk.stem.le…
-
I'm having some problems running this on CSC Mahti with the Ampere GPUs. I'm trying to run the STDIN/STDOUT stream entry point.
Here is the image: https://github.com/fshdnc/eb_class/pkgs/container/…
-
I was building today a model on a Dutch corpus from the 17th-19th century available at https://ivdnt.org/taalmaterialen/2282-pp-brievenalsbuit-j.
I trained it for building a lemmatiser using the fo…
-
It would be nice to be able to reverse the functionality of JVReplacer, so that it does not only do:
verbum > uerbum
but also
uerbum > verbum
I rewrote the JVReplacer class to do this: htt…
-
Fine-tune a dutch lemmatisation model, prime it with extra information from e-lex beforehand.
-
I have trained Flair Model for NER with my own corpus like below
ag_type = 'ner'
# 3. make the tag dictionary from the corpus
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
…
-
This is more a suggestion than an actual issue, but I see that much of the processing time is spent on running NLP modules from NLTK. I would actually recommend to drop NLTK altogether -- it's now a …