Closed cckao closed 7 months ago
The FAISS retrieval takes a lot of time, it is being performed at every head and every layer
Hi @cckao and @AshwinRamachandran2002 , Thank you for your interest in our work!
Yes, indeed running Unlimiformer is slower.
We found that using --layer_begin X
with a value X
that is at least half the number of layers (that is, if the model has 40 layers, X
should be at least 20) helps both speed and the quality of the output.
Additionally, if your input is not too long (<10k tokens), using --use_datastore False
may speed things up a bit.
Let us know if you have any questions! Best, Uri
Hi, @urialon and @AshwinRamachandran2002 ,
Thanks for your comments. --use_datastore False
speeds up a lot.
Hi,
Unlimiformer is amazing and can really help me. However, the inference is so slow that I believe I might do something wrong. Please help me. Thank you.
The task was pretty simple. I asked the LM to optimize following Python codes:
I run vanilla text generation with following command and
model.generate(...)
took 3 seconds to complete:While I enable Unlimiformer,
model.generate(...)
took 1 minute and 20 seconds to complete: