SparseList Follow Protocol Speedup

kylebd99 commented 1 month ago

The existing implementation always binary searches over the entire index vector. However, we should always be doing lookups in increasing order, so we can actually "consume" the index vector instead by moving up the lower search bound each time. This also lets us use the expsearch function instead because we now have reason to believe that the lower bound is close to the true value.

Before Change:

┌ Info: experiment
└   sp = 0.01
[ Info: merge
  78.459 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  290.835 μs (2 allocations: 96 bytes)
[ Info: gallop
  89.306 μs (2 allocations: 96 bytes)
[ Info: hash construction
  787.695 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  170.373 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.005
[ Info: merge
  51.509 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  161.717 μs (2 allocations: 96 bytes)
[ Info: gallop
  62.533 μs (2 allocations: 96 bytes)
[ Info: hash construction
  782.748 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  82.390 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.001
[ Info: merge
  15.879 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  37.581 μs (2 allocations: 96 bytes)
[ Info: gallop
  13.650 μs (2 allocations: 96 bytes)
[ Info: hash construction
  785.824 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  9.868 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0005
[ Info: merge
  13.307 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  19.855 μs (2 allocations: 96 bytes)
[ Info: gallop
  6.050 μs (2 allocations: 96 bytes)
[ Info: hash construction
  785.667 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  4.053 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0001
[ Info: merge
  9.198 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  3.912 μs (2 allocations: 96 bytes)
[ Info: gallop
  1.063 μs (2 allocations: 96 bytes)
[ Info: hash construction
  783.611 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  816.976 ns (2 allocations: 128 bytes)

After change:

┌ Info: experiment
└   sp = 0.01
[ Info: merge
  93.398 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  88.104 μs (2 allocations: 96 bytes)
[ Info: gallop
  92.542 μs (2 allocations: 96 bytes)
[ Info: hash construction
  777.537 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  174.599 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.005
[ Info: merge
  58.945 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  54.867 μs (2 allocations: 96 bytes)
[ Info: gallop
  60.835 μs (2 allocations: 96 bytes)
[ Info: hash construction
  787.040 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  86.802 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.001
[ Info: merge
  21.538 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  12.863 μs (2 allocations: 96 bytes)
[ Info: gallop
  14.054 μs (2 allocations: 96 bytes)
[ Info: hash construction
  757.483 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  8.999 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0005
[ Info: merge
  18.049 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  6.152 μs (2 allocations: 96 bytes)
[ Info: gallop
  6.423 μs (2 allocations: 96 bytes)
[ Info: hash construction
  790.806 μs (63 allocations: 1.61 MiB)
[ Info: hash (hashtable)
  3.876 μs (2 allocations: 128 bytes)
┌ Info: experiment
└   sp = 0.0001
[ Info: merge
  14.527 μs (2 allocations: 96 bytes)
[ Info: hash (search)
  929.694 ns (2 allocations: 96 bytes)
[ Info: gallop
  1.065 μs (2 allocations: 96 bytes)
[ Info: hash construction
  792.627 μs (63 allocations: 1.62 MiB)
[ Info: hash (hashtable)
  810.836 ns (2 allocations: 128 bytes)

Could use a close look though, I'm not 100% sure how the lowering works with 'thunk'

kylebd99 commented 3 weeks ago

@willow-ahrens Just wanted to bump this because I'd forgotten about it. It doesn't seem like the python test failure is related to the PR, but I'm not sure.

willow-ahrens commented 3 days ago

This took a while to merge as I needed to fix python ci

finch-tensor / Finch.jl

SparseList Follow Protocol Speedup #613