Closed nrmer closed 7 months ago
The memory would not grow very large, as we introduce a freq decay strategy ( [here]. (https://github.com/alipay/PainlessInferenceAcceleration/blob/6280cb2f097ba0bc6bc423ab910b9de7ddbe3bf2/pia/lookahead/common/lookahead_cache.py#L296) ), i.e., once the tree is overfat, we half the freqs of all reserved nodes, so its size will be (approximately) bounded by the max_node. We also benefit from the decay strategy when token distribution shifts, as the freqs of tokens which do not frequently occur currently would gradually decay to zero.
If I understood your code correctly you only delete a node, that was not created from the prompt, in
_squeeze
and only do this whenfreq[-1] <= 1.0
. There are two ways how this can happen:freq[-1] = 1.0
In a worst case scenario this means that the memory size could grow very large. Do you have any idea how large the memory could grow and how well your implicit memory management works (typical steady state size / theoretical bounds)?