-
http://hdl.handle.net/11372/LRT-560
- Metadata issue:
- [x] Unknown licence
-
- [ ] http://bilinguis.com/book/baskerville/es/en/c14/
-
In #771 I ran an experiment to see the effects of the size of the distillation corpus for the change in the COMET score for the students. Adding more data to this step did not affect the COMET score b…
-
First appeared here in https://aclanthology.org/L14-1215/
which references link: http://paralleltext.info/data/
but that link is no longer available.
However, recently https://arxiv.org/pd…
-
http://hdl.handle.net/11509/73
- Metadata issue:
- [ ] Unclear alignment/annotation
-
https://hdl.handle.net/11321/308
- Metadata issues:
- [x] Unknown size
- [ ] Unclear alignment/annotation
- [x] Unknown licence
-
I think this strategy is also good for clusterfuzz .
https://github.com/google/fuzzbench/pull/1197#issuecomment-880810941
https://www.fuzzbench.com/reports/experimental/2021-08-05-parallel/index.htm…
-
**Ахцәажәара**
The current parallel corpus has been extracted from various sources (ebooks,websites...)
**Ауадаҩрақәа**
The sentences are automatically lined up. We come across these issues…
-
In #771 I tested the effects of reducing the distillation data to understand that expensive part of our pipeline. However, we should do it again for the `base` student model, as the other one was done…
-
For example, [awesome-align](https://github.com/neulab/awesome-align) supports generating word by word parallel corpus alignment, i.e. the Pharaoh format files.
Or even can we achieve this in the cur…