alipay / PainlessInferenceAcceleration

Creative Commons Attribution 4.0 International

283 stars 18 forks source link

TODO in PainlessInferenceAcceleration/pia/lookahead/common/lookahead_cache.py #19

Closed nrmer closed 7 months ago

nrmer commented 7 months ago

Is the TODO in _squeeze already implemented? If not what do you want to implement there?

def _squeeze(self, nodes): for t, p in list(nodes.items()): fo = p.freqs.get(-1, 0.0)

TODO

        if fo > 1.0:
            p.freqs[-1] *= 0.5
            if len(p.children) > 0:
                self._squeeze(p.children)
        else:
            nodes.pop(t)

zheyishine commented 7 months ago

We intent to cache and reuse the deleted nodes to save the instantiation time for new tokens. However the performance gain may be marginal, we would like to implement with c++ to accelerate cache management in the future.