-
I want to basically pre-train models from scratch, including tokenizer, for languages included in IndicBART and IndicBERT and some more languages, so as to build a something like IndicBARTExt and Indi…
-
## Abstract
- Train source-to-target NMT (student) without parallel corpora available, guided by the existing pivot-to-target NMT (teacher) on a source-pivot parallel corpus
- X : source, Y : target…
-
I tried to extract the aligned sentence pairs from CCMatrix, previously downloaded using `opus_express`. The command I used was
```
opus_read --source en --target fi --directory CCMatrix --preproc…
-
I think the description in the README.md is currently not match to code……where is the script/train_dual_Q_value_shared.sh ? Any DIR configuration in the code need I modify?
-
**Paper**
Data Augmentation for Low-Resource Neural Machine Translation
**Introduction**
This research focuses on the challenges faced by low-resource languages in the neural machine translation…
-
### Metadata
- Authors: Tobias Domhan and Felix Hieber
- Organization: Amazon
- Conference: EMNLP 2017
- Link: https://goo.gl/eFj9gx
-
### Metadata
Authors: Shamil Chollampatt and Hwee Tou Ng1
Organization: National University of Singapore
Release Date: 2018 on Arxiv
Link: https://arxiv.org/pdf/1801.08831.pdf
-
I have to build a FR-JP model, and I don't find large parallel resources.
First of all, do you know about a large FR-JP corpus somewhere ?
I just have an idea. I would like to know if someone …
-
I am trying to satisfy your requirements. Kindly let me know where I can find the Boxer corpora as mentioned in your Read.Me file. Also I would like to know whether your program generates paraphrases …
-
Extends the random (or %-based random) source-mixing capability addressed in issues #266/309/310.
This enhancement would allow for using a source with less coverage than the target, usually an inc…