-
Do sentences have to be delimited in some way? I have trained the truecaser with a 3032679-word tokenized text in Spanish (1 sentence per line). It generates a model which has 102972 entries (is it a …
-
Hi, I'm using SacreMoses 0.0.7, special characters like `[` `]` `` are escaped when the text is being tokenized (using `MosesTokenizer.tokenize`), and are left as is in the results (the examples in th…
-
![image](https://user-images.githubusercontent.com/16017418/54796359-ea343a80-4c8a-11e9-928f-86c911964cf2.png)
示例里面用的是中翻英系统,src填中文语料路径,tgt填英文语料路径。
如果我想训练英翻中系统,src也是填中文,tgt也是填英文吗?
-
Just wonder the properly data preparation to train a translation model
-
hi guys, i'm trying to make "make" (haha lol), but i'm getting this error:
./mystl.h:28:10: fatal error: 'tr1/unordered_map' file not found
this is for the calling in line 28 in file mystl.h
…
-
Currently `sacremoses.util.is_cjk` treats japanese kanas as CJK characters which I suppose should be excluded.
Maybe it is better to use https://en.wikipedia.org/wiki/Unicode_block as the reference…
-
Hi, I'm using the python port of Moses Tokenizer in my project, I would appreciate it if you add a LICENSE file and copyright information to the repository.
And what's the official name for this py…
-
I follow the instructions and run "make -j" in tools/ folder. I got the error as follow:
...
gcc.compile.c++ bin.v2/libs/wave/build/gcc-5.4.0/release/link-static/threading-multi/cpplexer/re2clex/cpp…
-
In the ELMO tutorial "using ELMO interactively" [section](https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md#using-elmo-interactively), it would be useful to mention what tokeniz…
-
I saw we have to remove empty target sentences for the NUCLE development data.
Do we have to do the same for the NUCLE training data?
Thank you very much.