Closed AADeLucia closed 2 years ago
@VictorSanh ?
Hi @AADeLucia, thanks for your patience, I was heads down wrapping up a sprint.
The inference of the model does support multi-token labels. That's why we need label token masking in the forward: https://github.com/bigscience-workshop/t-zero/blob/a961d57704c682a8ef58f56bb9c8a41a8bd8f1a8/t0/model.py#L56, and calculate the log prob of the sequence from that
Ah, thank you. I know this is outside of T0 support, but could you point me to any resources that explain how the multi-token inference works? I'm familiar with auto-regressive language models (GPT-2) but not how this works with text-infilling.
In the T5 paper:
The decoder in an encoder-decoder Transformer is used to autoregressively produce an output sequence. That is, at each output timestep, a token is sampled from the model’s predicted distribution and the sample is fed back into the model to produce a prediction for the next output timestep, and so on
This makes me think the model is decoding greedily (i.e., feed in token1 then token2), but I only see a single call to forward
to produce multi-token output.
Is the probability calculated independently? i.e., "token1" following the input and "token2" following the input? Or does the order matter?
I would recommend this blogpost if you want to understand better generation methods!
For T0, we are not generating sequences of outputs, but rather take the multiple choice (the few classification options), and compute their probability (log probability to be exact) under the model (the log probability of the option through the decoder conditioned on the encoder which has been feed the input).
So we are literally feeding, to the encoder, and feeding
Got it, I was confusing log prob with what you would get going through generate()
. Thank you!
There are a mix of single-token multiple choice and multi-token multiple choice in the prompt dataset. In the
run_eval.py
code, it appears to only be written for single-token multiple choice. I only see a single call toforward
:https://github.com/bigscience-workshop/t-zero/blob/master/evaluation/run_eval.py#L348
How do you calculate the probability of each multi-token/phrase option? Is that code in this repo?
Thanks.