-
https://github.com/CLARIAH/usecases/blob/master/cases/hipaco.md
-
-
## 一言でいうと
多言語の分散表現を得る手法(LASER)。Bi-directional LSTMのEncoder/Decoderが基本で、Encoderで処理した文はMax-poolをとり、Decode時に常に言語IDとともにconcatする。Encoderが言語独立の表現獲得を担当し、Decoderが言語固有の復元を担当する形で学習を行う。
![image](https://us…
-
I have no idea how to fix this, any help or at least guidance is appreciated.
And here is my current log for a new job.
It seems to be trying to `Collecting translated mono src dataset` before tra…
-
**Paper**
Noising and Denoising Natural Language: Diverse Back Translation for Grammar Correction
**Introduction**
This research proposes a solution for data sparsity (noisy and clean pairs) in g…
-
- ML test-bench style
- streaming data? or fixed corpus? (transfer costs excessive on Azure - probably want a fixed corpus)
- two corpuses: [DL S2 DLSR, S1] and [L2A, EES1]
- pytorch
- pytorch par…
-
Thank the original author for his work.
But, "This repository is over its data quota. Purchase more data packs to restore access." This question is really stupid! It's like eating steak with a nail c…
-
Retraining on checkpoint works perfectly with the tokenization on the fly, but breaks while using nanoset: training restart with a different lr, which is not the same as lr_schedule.pt
We also have…
-
(usage scenario)
```
korpora parallel \
--corpus_names aihub open_subtitles_2018 \
--output_dir path/to/train/ \
--target_lang en \
--save_each
```
```
korpora parallel \
--corpu…
lovit updated
3 years ago
-
Add multilingual corpus available from https://github.com/danielinux7/Multilingual-Parallel-Corpus