apoorvumang / prompt-lookup-decoding

473 stars 23 forks source link

Plans for paper or technical report #3

Open shermansiu opened 10 months ago

shermansiu commented 10 months ago

Apoorv, do you have plans for a paper or a technical report for prompt lookup decoding?

I know you've indicated that people should cite your GitHub repo, but it would be nice to have something out there with more extensive experiments across a variety of datasets, models, model sizes, and hardware types (e.g. CPU/GPU, various types of GPUs). Moreover, it would be nice to have a side-by-side comparison between prompt lookup decoding and other similar methods.

apoorvumang commented 10 months ago

Yes we have something in the works :)

Are there certain experiments/comparisons that you would be most interested in?

shermansiu commented 10 months ago

In terms of methods to compare:

Metrics to compare:

Hardware:

apoorvumang commented 10 months ago

Thanks! We will definitely look into some of these

shermansiu commented 10 months ago

And FYI, LADE actually achieves a speed slowdown under the default settings on a RTX 3090 and the LADE parameters need to be adjusted to be less intense to get a mild speedup.

shermansiu commented 10 months ago

(The LADE authors know about this, as it was brought up by Joao Gante from the Huggingface staff and independently by another user on their GitHub repo)

iofu728 commented 3 weeks ago

Thank you for your excellent idea. However, I'd like to kindly point out that this concept may be very similar to 'Aggressive Decoding' https://arxiv.org/pdf/2106.04970.