EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
7.09k stars 1.91k forks source link

The input format for XNLI seems wired? #1822

Open SefaZeng opened 6 months ago

SefaZeng commented 6 months ago

I try to test the XNLI results using the latest commit. And I find the inputs have a prefix_token which looks like:

[(('', ' 他说,妈妈,我回来了。, 正确? 是的, 校车把他放下后,他立即给他妈妈打了电话。'), [1], [55363, 2296, 55361, 2049, 55361, 55407, 7523, 55364, 55440, 55363, 2128, 56818, 5927, 55362, 55440, 52081, 55587, 5492, 6731, 55446, 55361, 55425, 1935, 3644, 2049, 7810, 878, 55364])]

This is the content of requests for _loglikelihood_tokens. The context is '' and context_enc is [1]. It is appended in this function from lm_eval/api/model.py:

    def loglikelihood(
        self, requests, disable_tqdm: bool = False
    ) -> List[Tuple[float, bool]]:
        new_reqs = []
        for context, continuation in [req.args for req in requests]:
            if context == "": 
                # BOS or EOS as context
                context_enc, continuation_enc = ( 
                    [self.prefix_token_id],
                    self.tok_encode(continuation),
                )
            else:
                context_enc, continuation_enc = self._encode_pair(context, continuation)

            new_reqs.append(((context, continuation), context_enc, continuation_enc))

        return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)

As XNLI's config (xnli_zh.yaml) is this:

dataset_name: zh
doc_to_choice: '{{[premise+", 正确? 是的, "+hypothesis,premise+", 正确? 所以, "+hypothesis,premise+",                                                                                                                                                                                              
  正确? 不是的, "+hypothesis]}}'
doc_to_text: ''
include: xnli_common_yaml
task: xnli_zh

It merges both premise and hypothesis and has no context. So, the code will add a prefix_token_id to the input for any model. This input format seems wired for base model as most LLM do not use bos or eos in the start of inputs (maybe except Gemma).

haileyschoelkopf commented 6 months ago

cc @lintangsutawika

lintangsutawika commented 6 months ago

@SefaZeng this is intentional. It's inspired by how XNLI was evaluated in the XGLM paper. image The doc_to_choice has 3 options which a decoder model is simply required to choose which is the most likely.