-
(Apologies if this is not the right repository to report this issue)
The data prepared for the Malayalam language has an issue. Consistantly there is a space before and after the Virama ് (U+0D4D)…
-
I produced a model English-Chinese. However,during decoding, it does not translate long sentences. Below is my valid.log and my configuration.
Thank you a lot in advance.
* Valid.log:
```
[2019-…
-
Hi,
I'm trying to employ sentence piece in a project but I would need subwords to belong to just one original token, like bpe does.
It is supported by the library?
Thanks in advance,
Carlos
-
Can this tool be used to do Chinese word segmentation?
Thank you very much!
-
Hi, I'm currently doing my research in applying "Subword Regularization" to training NMT model, where they sample from segmentation candidates every parameter update.
I am trying to apply this metho…
-
Dear all,
It is my first time for me to learning NMT. I had prepared some data for training model.
I use ./build/marian --train-sets corpus.en corpus.ro and I got the below error.
I am using …
-
Training dataset size: 25 million
source vocab size: 1.9 million
target vocab size: 2.3 million
Running the training command:
python train.py -data data/demo -save_model demo-model
![image](h…
-
**Problem description and stack trace**
I'm running Semantic Role Labelling, getting the model from this URL: "https://s3-us-west-2.amazonaws.com/allennlp/models/bert-base-srl-2019.06.17.tar.gz".
W…
nsaef updated
5 years ago
-
@NirantK I thought I would check before submitting another PR – would the following NLP tool fit the list?
https://github.com/amir-zeldes/RFTokenizer
It is a trainable subword tokenizer for morp…
-
The [documentation about `--guided-alignment`](https://marian-nmt.github.io/docs/#guided-alignment) lacks some details that I would like to confirm:
- The corpus fed to the alignment tool (e.g. `fa…