LibreTranslate / Locomotive

Toolkit for training/converting LibreTranslate compatible language models 🚂
GNU Affero General Public License v3.0
46 stars 11 forks source link

Add weights support, validate on flores #5

Closed pierotofy closed 1 year ago

pierotofy commented 1 year ago

Adds support for specifying dataset weights.

Also refactors the code to avoid merging datasets into a single file, which can be time and memory consuming. It picks the validation dataset from flores200 rather than extracting it from the corpus data, which avoids the need for sampling.