fe1ixxu / ALMA

State-of-the-art LLM-based translation models.
MIT License
395 stars 29 forks source link

preprocess_cpo_data #46

Closed martimfasantos closed 3 months ago

martimfasantos commented 4 months ago

utils/utils.py

def get_chosen_reject(example, target_lang):
        sys1_score_key = f"gpt4_{target_lang}_{data_args.cpo_scorer}"
        sys2_score_key = f"alma_{target_lang}_{data_args.cpo_scorer}"
        ref_score_key = f"ref_{target_lang}_{data_args.cpo_scorer}"

        sys1_output_key = f"gpt4_{target_lang}"
        sys2_output_key = f"alma_{target_lang}"
        ref_output_key = target_lang

        [...]

        # Human eval
        if "Delta" in example and example["Delta"] != 0:
            if example["Delta"] > 0:
                return example[sys1_output_key], example[sys2_output_key]
            else:
                return example[sys2_output_key], example[sys1_output_key]

        [...]
        return highest_score_sentence, lowest_score_sentence

For the scenario where Delta > 0, the provided code designates the output from GPT4_target_lang as the highest_score_sentence and the output from ALMA as the lowest_score_sentence. However, according to the dataset haoranxu/ALMA-R-Preference, it states:

**Others**
- Delta: A value of 0 indicates non-human annotated data or tied evaluations. A postive number suggests that alma_de is better than gpt4_de, vice versa

@fe1ixxu

fe1ixxu commented 3 months ago

Hey, thanks for pointing out the differences! It appears that the data description in the README is wrong, and I have made it correct. Thanks again for your careful checking!