How to enable large corpus training / The hint of the script not clear to me

LibreTranslate / Locomotive

Toolkit for training/converting LibreTranslate compatible language models 🚂

GNU Affero General Public License v3.0

45 stars 10 forks source link

Hi, I like your repo! Thanks for that

I use this training config (the links to the corpuses are skipped)

{
    "from": {
        "name": "German",
        "code": "de"
    },
    "to": {
        "name": "Swedish",
        "code": "sv"
    },
    "version": "0.2",
    "sources": [

    ],
    "batch_size": 4096,
    "input_sentence_size": 130000000,
    "shuffle_input_sentence": true,
    "vocab_size": 200000,
    "train_steps": 2000000,
    "early_stopping": 30
}

The input_sentence_size is 130.000.000! I got this error:

Input corpus too large, try with train_extremely_large_corpus=true

How can I set this flagg? I did not found some point in the code or config to set it.

Would be cool to get some help. Thx!

LibreTranslate / Locomotive

How to enable large corpus training / The hint of the script not clear to me #28