-
Saving a previously trained model like shown in the readme does not work.
```python
mtr = MosesTruecaser()
mtr.train('big.txt') # should be mtr.train_from_file('big.txt')?
mtr._save_model('big.tr…
pypae updated
5 years ago
-
We used the truecaser for some of our corpora with >8M segments. There are some issues when training a truecaser for larger corpora:
- Using `joblib.Parallel` causes a huge memory footprint even wh…
pypae updated
5 years ago
-
norvig.com is currently down which is causing the tests in `sacremoses/test/test_truecaser.py` to fail if `big.txt` has not already been downloaded. I'm wondering if there is another source of the fil…
-
If a word is not the first word of the sentence, and the word was seen with this exact casing in the training material, the original script does not recase the word.
i.e.
```bash
perl train - tru…
pypae updated
5 years ago
-
Do sentences have to be delimited in some way? I have trained the truecaser with a 3032679-word tokenized text in Spanish (1 sentence per line). It generates a model which has 102972 entries (is it a …
-
I'm trying to run the PBSMT model. As far as I can tell, run.sh doesn't deal with casing properly. The induced phrase-table is all lower-case, but the test text is never lower-cased, which means that …
-
Hi, @alvations.
A student of mine and I are using the truecaser from a Python 3 script in windows. The script makes sure that all files are opened in utf-8 by redefining open as follows:
```
open =…
-
I train transformer model with en-fr data, I run it for several times but it seems deadlock when finish a batch at every time, log is as follow
[2018-09-19 20:47:48] Training started
[2018-09-19 2…
-
I'm using Truecaser model, and I am gonna using wikipedia data for training Truecaser model.
I searched on Google about this and it seems no one ever showed a successful experience of training a new …
-
As the title suggests, entities in lower case are not recognized as entities. I also noticed entities in upper case are not recognized either. It seems to only recognize entities with title/proper cas…