fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
352 stars 26 forks source link

Polite form selection #13

Closed cmp-nct closed 6 months ago

cmp-nct commented 7 months ago

I found your model to be very interesting, currently testing it in comparison to the other few available methods.

In german language the polite form is a significant difference to the casual form and depending on who you address one of both is correct. This is not always obvious from the context.

Alma 13b + lora appears to choose the polite form quite at random, one time it is "Du bist" next time it is "Sie sind".

I tried adding a request into the system message but the lora is hammered in, it ignores anything. Maybe there are still ways but this appear to be quite a flaw in the translation design, or maybe I just missed the solution ?

Update: The best I came up with so far is to use ALMA like a chat finetune and that appears to have an effect:

Translate this from English to German:
English: Explanation for you, a friend:
German:Erklärung für dich, einen Freund: </s>
English: <ENGLISH INPUT>
German:

It appears that works though I do not know how much damage I do to the fine tune. I also tried to continue chat-style by just adding the </s>\n before the next English: prompt, it also appears to work to have a continued translation session. Was it trained with more than one prompt ? Any insights are welcome

fe1ixxu commented 7 months ago

Hi, Thanks for your interest!

Was it trained with more than one prompt ?

Yes, ALMA models are fine-tuned with one prompt.

I think the chat-style approach you've tried above should be effective here, as it likely won't significantly impact the model due to it being LoRA fine-tuning. Another option you might consider is a few-shot evaluation for ALMA-13B-LoRA, incorporating several polite translation pairs in the prefix. It is very likely to induce the model to generate polite format without any fine-tuning.

cmp-nct commented 7 months ago

Thanks for your response!

I did invest more hours into it and the chat approach appears to work but not fully reliably. I've paired it with a kv-cache rollback in case polite form happens but that doesn't help quality, it appears that once the model starts going into "polite form" it not only depends on that "one or two" tokens sampled (like haben sie/hast du) but the model really wants to continue polite - it doesn't even have the "hast du" approach in available logits.

How exactly would you prompt the multi shot approach ? I guess my attempt is actually a 1-shot ? I changed my format based on your python script, it appears using english system message is not correct ?

So right now that's what I have at the moment:

Übersetzen Sie dies vom Englischen ins Deutsche:
Englisch: Hi, description for you, my friend:
Deutsch:Hallo, Beschreibung für dich, mein Freund: </s>
Englisch: <first real prompt>\n
Deutsch:

And from here on I provide sentence by sentence with similar chat-like termination. So the whole text is in context at the end

Is the termination with a </s>\n similar as it was trained ? I suppose even a small deviation might hurt quality

fe1ixxu commented 6 months ago

Thanks for your response! In fact, you do not need to manually add and colon ":". The muti-shot (e.g., 2-shot) I will do will be:

Translate this from English to German:
English: Explanation for you, a friend.
German: Erklärung für dich, einen Freund.
English: Hi, description for you, my friend.
German: Hallo, Beschreibung für dich, mein Freund.
Englsih: <your source English>
German:
cmp-nct commented 6 months ago

Thanks for the help! You prompt it in full english, is that how I should do it ? In your python trainer script I found the prompt "Übersetzen Sie dies vom Englischen ins Deutsche:" That's why I switched from english to a german prompt. I believe the resulting answers improved slightly, but that could have been random chance

fe1ixxu commented 6 months ago

Yes! The prompt fully in English is enough. Prompt in other languages are just used for ablation study in Appendix E in the paper.