Clarification on edls/dls/ft in perf_check

alipay / PainlessInferenceAcceleration

Creative Commons Attribution 4.0 International

283 stars 18 forks source link

The debug infos( i.e., edls, dls, etc ) are indeed a bit unclear. edls: short ofr effective decoding lengths, i.e., generate token count in a forward, therefore edls always >=1(even without lookahead, we will generate one token in a forward, so edls=1) dls: short of decoding lengths, i.e., token count in a forward, always >= 1. Note that it is set to 1 intead of prompt length in the prefill stage. fts: short for forward time(s), the first is the prefill time and others are decoding times. qts: short of query time(s), i.e., the time for retrieving a sub trie tree. We will add the explaination in the readme later.

alipay / PainlessInferenceAcceleration

Clarification on edls/dls/ft in perf_check #18