AI-Commandos / LLaMa2lang

Convenience scripts to finetune (chat-)LLaMa3 and other models for any language
Apache License 2.0
276 stars 32 forks source link

[Bug] Error with benchmarking: 'NoneType' object is not iterable #64

Closed samolego closed 6 months ago

samolego commented 6 months ago

Branch main

Environment RAM/vRAM

Script with parameters

python benchmark.py en sl "opus, m2m_418m, m2m_1.2b, madlad_3b, madlad_7b, madlad_10b, madlad_7bbt, mbart, nllb_distilled600m, nllb_1.3b, nllb_distilled1.3b, nllb_3.3b, seamless"  # Try to benchmark

Data layout or HF dataset opsu-100

Problem description/Question i'm getting an error when trying to benchmark the translator models ... I ran the above command and get the following output:

[---- LLaMa2Lang ----] Starting benchmarking from en to sl for models ['opus'] on 100 records on device cuda:0
[---- LLaMa2Lang ----] No translation possible from en to sl
Traceback (most recent call last):
  File "benchmark.py", line 109, in <module>
    main()
  File "benchmark.py", line 98, in main
    translated += translator.translate([s[source_language]], source_language, model_target_language)
TypeError: 'NoneType' object is not iterable

I'm not sure why there No translation possible from en to sl, as it clearly exists on dataset: https://huggingface.co/datasets/Helsinki-NLP/opus-100/viewer/en-sl

ErikTromp commented 6 months ago

Slovenian exists in the OPUS translation dataset yes, but there is no OPUS translation model for it (at least I can't find one: https://huggingface.co/models?search=opus-mt-en-sl).

That means while you can benchmark SL, you cannot translate it with OPUS (but still can use other model architectures).

samolego commented 6 months ago

Ah, sorry, thank you!