Poor quality translation after model training

HOAZ2 commented 2 days ago

I have trained a new model from en to fa based on Opus repos., but left out some very old text like CCmatrix.... and adedd 2 custom repos of news translation of 30k sentences.

"sources": [
        "opus://OpenSubtitles",
        "opus://XLEnt",
        "opus://wikimedia",
        "opus://TED2020",
        "opus://NeuLab-TedTalks",
        "opus://Wikipedia",
        "opus://TED2013",
        "opus://infopankki",
        "opus://QED",
        "opus://GlobalVoices",
        "opus://tico-19",
        "opus://ELRC-3078-wikipedia_health",
        "file://D:\\ArgosTranslate\\parallel-datasets\\PEPCWikipedia-en_fa",
        "file://D:\\ArgosTranslate\\parallel-datasets\\NEWSparallel2024-en_fa"
    ]

I have trained the new model only 10k steps with 1k valid_steps and 1k save_checkpoint_steps. The last training loop showed approximately acc:31 ppl: 150 ........ which is poor quality. Checked the new model within Libretranslate that showed very poor result and repeats some text patterns from the training data. Is this normal for 10k steps and I should continue training the model or there is something wrong with my training data specially with my custom data? And if i should continue the training how should I do this within the Locomotive script? Only increasing the "train_steps" value inmodel-confi.jsonand rerun the script will continue the process from 10k steps on or I should change some more variables in other scripts?

pierotofy commented 1 day ago

Could we move this conversation over to the forum at https://community.libretranslate.com? :pray: The forum is the right place to ask questions (we try to keep the GitHub issue tracker for feature requests and bugs only). Thank you! :+1:

HOAZ2 commented 1 day ago

Yes, Sorry. Agree

LibreTranslate / Locomotive

Poor quality translation after model training #30