apoorvumang / prompt-lookup-decoding

436 stars 22 forks source link

Comparison with LookAhead #2

Open RonanKMcGovern opened 8 months ago

RonanKMcGovern commented 8 months ago

This is a cool project.

I guess you're using the prompt for look ahead, but could also pull in some future guess tokens as well into the ngram look up table. Maybe as LookaheadDecoding is doing?

I was also thinking that it should be possible to use an LLM to predict forward tokens just by passing blank (zero embedding vectors) for a few positions ahead. See more here

apoorvumang commented 8 months ago

Thank you for the compliment!

I guess you're using the prompt for look ahead, but could also pull in some future guess tokens as well into the ngram look up table. Maybe as LookaheadDecoding is doing?

TBH I don't yet understand lookahead decoding completely so can't comment here

I was also thinking that it should be possible to use an LLM to predict forward tokens just by passing blank (zero embedding vectors) for a few positions ahead. https://github.com/hao-ai-lab/LookaheadDecoding/issues/37

Are you suggesting this for the draft model or main model? This might help in making draft tokens faster, but I feel this won't give good results since prev token is probably very important when predicting the next token. Medusa requires some training to be able to do this https://github.com/FasterDecoding/Medusa

RonanKMcGovern commented 8 months ago

This might help in making draft tokens faster, but I feel this won't give good results since prev token is probably very important when predicting the next token.

Yeah, I'm not sure it would work, but may be worth a try. I think that guessing the previous token randomly is pretty bad because token prediction depends so much on the previous one. However, if a null embedding (and/or attention mask) is placed on token i, there may be some way of getting a reasonable estimate of token i+1. But yea, the prediction may still be too bad.

Medusa is a cool concept, but it's really annoying to have to train the in-built draft model.

apoorvumang commented 8 months ago

If someone can figure out a 'training free Medusa', that's probably a million dollar idea 😸

riyaj8888 commented 7 months ago

AttributeError: 'MistralForCausalLM' object has no attribute '_extend_attention_mask'