alipay / PainlessInferenceAcceleration

Creative Commons Attribution 4.0 International
283 stars 18 forks source link

size of memory footprint #21

Closed nrmer closed 7 months ago

nrmer commented 7 months ago

If I understood your code correctly you only delete a node, that was not created from the prompt, in _squeeze and only do this when freq[-1] <= 1.0. There are two ways how this can happen:

  1. The node just got created but has no reuse -> freq[-1] = 1.0
  2. A parent got reused while the node did not get reused while its freq[-1] was <= 1.0

In a worst case scenario this means that the memory size could grow very large. Do you have any idea how large the memory could grow and how well your implicit memory management works (typical steady state size / theoretical bounds)?

zheyishine commented 7 months ago

The memory would not grow very large, as we introduce a freq decay strategy ( [here]. (https://github.com/alipay/PainlessInferenceAcceleration/blob/6280cb2f097ba0bc6bc423ab910b9de7ddbe3bf2/pia/lookahead/common/lookahead_cache.py#L296) ), i.e., once the tree is overfat, we half the freqs of all reserved nodes, so its size will be (approximately) bounded by the max_node. We also benefit from the decay strategy when token distribution shifts, as the freqs of tokens which do not frequently occur currently would gradually decay to zero.