alipay / PainlessInferenceAcceleration

Creative Commons Attribution 4.0 International
283 stars 18 forks source link

Clarification on edls/dls/ft in perf_check #18

Closed nrmer closed 7 months ago

nrmer commented 7 months ago

I am not quite sure what dls and ft stand for in your perf_check function when benchmarking PIA. I assume that edls stands for effective decoding length. Is dls the number of tokens that are getting proposed and ft the time this takes? Some clarification on that would be much appreciated.

For size 0, no lookahead, the output is edls=0. How does this compare to an edls of 1 when using lookahead? Shouldn't no lookahead have edls of 1 since we always add 1 token?

zheyishine commented 7 months ago

The debug infos( i.e., edls, dls, etc ) are indeed a bit unclear. edls: short ofr effective decoding lengths, i.e., generate token count in a forward, therefore edls always >=1(even without lookahead, we will generate one token in a forward, so edls=1) dls: short of decoding lengths, i.e., token count in a forward, always >= 1. Note that it is set to 1 intead of prompt length in the prefill stage. fts: short for forward time(s), the first is the prefill time and others are decoding times. qts: short of query time(s), i.e., the time for retrieving a sub trie tree. We will add the explaination in the readme later.