-
Hi @kpu , I met a weird issue: training the n-gram model with relative small corpus was OK, but it raised baddiscount error with even more corpus
1. Training the N-gram model with corpus of 20 mill…
-
Hi,
In the wiki for Training, it mentioned the command for training models on new corpus:
python train.py --trainFile MY_TRAINING_CORPUS.xml --develFile MY_DEVELOPMENT_CORPUS.xml --testFile MY_TES…
-
SignBank and Sign2Mint are having loading issues in https://github.com/sign-language-processing/datasets/blob/master/examples/load.ipynb, perhaps this is why.
Running PyTest as noted in #53, I find…
-
Hi,
Initially I though it was due to excessive timeouts, but they have been fixed now. Some of testcases are stuck, all I see is pending status and progression started that never ends.
```
oss-fu…
-
Hi! I am trying to load and play with the dgs corpus dataset.
Now I load it and it downloads them locally. Then try to loop trough them but even if I just sleep in the first iteration of the loop or …
-
We now need a reasonable corpus for us to experiment on. In the interest of time, I am tempted to not use the Wikipedia corpus (or at least not all of it), since that is too large.
- [x] Decide on …
-
Hi Kwonmha,
Thanks for open source the repo. Can I ask generally the preprocessing steps for vocab builder, for a uncased bert model is follows:
1. Convert corpus text file to lower case
2. Removal…
-
### Describe the bug
Dataset https://huggingface.co/datasets/blog_authorship_corpus has an issue with its hosting platform, since https://drive.google.com/u/0/uc?id=1cGy4RNDV87ZHEXbiozABr9gsSrZpPaPz&…
-
I'm trying to build my own classifier based POS tagger using `SklearnClassifier` and `ClassifierBasedPOSTagger`. The code that I've tried is given below.
```
from nltk.corpus import treebank
nltk…
-
command:
python /home/ts75080/Documents/IMS-Toucan/TrainingPipelines/finetuning_example_simple_ewe.py finetuning_example_simple_ewe \
--gpu_id 0 \
--finetune \
--resume
Traceback (most recent …