Woolverine94 / biniou

a self-hosted webui for 30+ generative ai
GNU General Public License v3.0
464 stars 52 forks source link

Nllb translation is broken (due to use of a formerly-deprecated symbol that has been deleted) #31

Closed trolley813 closed 2 months ago

trolley813 commented 2 months ago

The lang_code_to_id in transformers library was deprecated sometime ago, so attempting to use the translator fails with

AttributeError: 'NllbTokenizerFast' object has no attribute 'lang_code_to_id'

As explained in huggingface/transformers#31348, the correct solution is to use convert_tokens_to_ids instead (I've tested and can confirm that it does indeed work).

Here's the patch:

--- a/ressources/nllb.py
+++ b/ressources/nllb.py
@@ -267,7 +267,7 @@ def text_nllb(

     translated_tokens = automodel_nllb.generate(
         **inputs_nllb,
-        forced_bos_token_id=tokenizer_nllb.lang_code_to_id[output_language_nllb],
+        forced_bos_token_id=tokenizer_nllb.convert_tokens_to_ids(output_language_nllb),
         max_new_tokens=max_tokens_nllb, 
     )
trolley813 commented 2 months ago

The comments above are probably spam sent from hacked accounts. Don't know if they can be blocked or at least complained about.

Woolverine94 commented 2 months ago

Hello @trolley813,

Double thanks to you, both for :

I will provide a commit for fixing the nllb module ASAP based on your advices.

Thanks again :+1:

Woolverine94 commented 2 months ago

Commit 9ecd647 fix this issue. Credits go to @trolley813, both for reporting the issue and fixing it.