Question about amateur context window

XiangLi1999 / ContrastiveDecoding

contrastive decoding

173 stars 11 forks source link

Question about amateur context window #4

Open marcofarina84 opened 1 year ago

marcofarina84 commented 1 year ago

Dear @XiangLi1999 and @ari-holtzman, if I understand correctly the paper, in section 3.4, mentions that the amateur (student) model is conditioned on a context window which starts from the last token of the prompt. I cannot find any trace of such a choice in the code, for instance here and here the whole input is passed to the amateur model, as seen by the expert too.
I cannot find the relative study in the ablation script either.

Am I missing some argument/logic that sets the amateur's context window somewhere else in the code?

Best, Marco

XiangLi1999 commented 1 year ago

It's handled here: https://github.com/XiangLi1999/ContrastiveDecoding/blob/170e9142e92159c1237d731e240f5eb14aabf428/text-generation/run_generation.py#L522

marcofarina84 commented 1 year ago

Great, thanks! Just one last clarification, I might be misunderstanding the code but it seems like the function is feeding to the amateur only the last generated token, so the amateur is computing $p(xi|x{i_1})$. Can you confirm it? While section 3.4 of the paper seems to states that the amateur is conditioned on the last token of the prompt + all the generated tokens.

XiangLi1999 commented 1 year ago

Hi,

I think the code is doing what section 3.4 states, conditioning on last token in prompt + generated tokens. You can verify this by printing the past_key_values argument. This works because of the caching implementation in huggingface, once a token is generated, it will be encoded as past_key_values to save some redundant computation.