Closed nrmer closed 7 months ago
The debug infos( i.e., edls, dls, etc ) are indeed a bit unclear. edls: short ofr effective decoding lengths, i.e., generate token count in a forward, therefore edls always >=1(even without lookahead, we will generate one token in a forward, so edls=1) dls: short of decoding lengths, i.e., token count in a forward, always >= 1. Note that it is set to 1 intead of prompt length in the prefill stage. fts: short for forward time(s), the first is the prefill time and others are decoding times. qts: short of query time(s), i.e., the time for retrieving a sub trie tree. We will add the explaination in the readme later.
I am not quite sure what dls and ft stand for in your perf_check function when benchmarking PIA. I assume that edls stands for effective decoding length. Is dls the number of tokens that are getting proposed and ft the time this takes? Some clarification on that would be much appreciated.
For size 0, no lookahead, the output is edls=0. How does this compare to an edls of 1 when using lookahead? Shouldn't no lookahead have edls of 1 since we always add 1 token?