LibreTranslate / Locomotive

Toolkit for training/converting LibreTranslate compatible language models 🚂
GNU Affero General Public License v3.0
45 stars 10 forks source link

How to enable large corpus training / The hint of the script not clear to me #28

Closed bees4ever closed 1 week ago

bees4ever commented 1 week ago

Hi, I like your repo! Thanks for that

I use this training config (the links to the corpuses are skipped)

{
    "from": {
        "name": "German",
        "code": "de"
    },
    "to": {
        "name": "Swedish",
        "code": "sv"
    },
    "version": "0.2",
    "sources": [

    ],
    "batch_size": 4096,
    "input_sentence_size": 130000000,
    "shuffle_input_sentence": true,
    "vocab_size": 200000,
    "train_steps": 2000000,
    "early_stopping": 30
}

The input_sentence_size is 130.000.000! I got this error:

Input corpus too large, try with train_extremely_large_corpus=true

How can I set this flagg? I did not found some point in the code or config to set it.

Would be cool to get some help. Thx!

pierotofy commented 1 week ago

Could we move this conversation over to the forum at https://community.libretranslate.com? :pray: The forum is the right place to ask questions (we try to keep the GitHub issue tracker for feature requests and bugs only). Thank you! :+1: