biocore / improved-octo-waddle

Balanced parentheses succinct data structure in Python
BSD 3-Clause "New" or "Revised" License
6 stars 7 forks source link

remove select caches #20

Open wasade opened 8 years ago

wasade commented 8 years ago

This is dependent on being able to properly traverse the binary tree to identify the correct segment for a given value of k.

wasade commented 8 years ago

Have what appears to be an implementation as of b52eacc however, its performance on .collapse suggests it is not worth keeping (probably from preorderselect). Likely better ways to navigate the binary tree? The weirdest thing, during testing, saw essentially zero performance change when typing the code. As a result, my guess is getting hosed either by log2 or by pow which the binary tree ops rely on a lot especially for select. pow can be resolved by looking tables as we're only doing integers, and really shouldn't be getting too large. Could probably also do a lookup for log2 as we should only ever be calculating over a small range, and it is integer as well.

On the plus side, observed about a 20% reduction in runtime when parsing, and about a 20% reduction in space, when testing in EMP OR. But net loss due performance of critical ops.

wasade commented 2 years ago

It would be nice to remove the caches. It should be possible to calculate rank, select and excess from the rmm structure. However, this probably will be tedious to get correct.