Closed pierotofy closed 1 year ago
Adds support for specifying dataset weights.
Also refactors the code to avoid merging datasets into a single file, which can be time and memory consuming. It picks the validation dataset from flores200 rather than extracting it from the corpus data, which avoids the need for sampling.
Adds support for specifying dataset weights.
Also refactors the code to avoid merging datasets into a single file, which can be time and memory consuming. It picks the validation dataset from flores200 rather than extracting it from the corpus data, which avoids the need for sampling.