blevesearch / zapx

Zap file format compatible with a future version of Bleve
Apache License 2.0
11 stars 12 forks source link

MB-61029: Caching Vec To DocID Map #231

Closed Likith101 closed 5 months ago

Likith101 commented 6 months ago
abhinavdangeti commented 6 months ago

@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up.

Likith101 commented 6 months ago

@Likith101 Let's also rebase your changes over the base branch, some conflicts have cropped up.

I'll rebase once the base branch is merged.

abhinavdangeti commented 6 months ago

We've merged the base branch, will review after the rebase then (you can hit edit above to change the base branch to master) - make sure only the diff appears in this PR here.

abhinavdangeti commented 5 months ago

@Likith101 I feel the base branch you're using here has deviated so much and since you have a commit that does something and then goes back on it with a following commit - resolving is not quite straight forward.

My recommendation for you here is ..

Likith101 commented 5 months ago

Here are some of the performance improvements that I noticed on my local when trying to test these changes.

Time Between Queries Average Latency with No Cache Only Vector Index Cached Both Index and Map Cached
10ms 559.47ms 69.74ms 29.85ms
25ms 524.53ms 66.21ms 32.15ms
50ms 525.32ms 65.93ms 37.58ms
75ms 522.78ms 66.86ms 40.72ms
100ms 523.66ms 68.32ms 45.74ms
200ms 524.63ms 84.30ms 45.82ms
250ms 537.42ms 95.01ms 47.12ms
300ms 523.27ms 97.04ms 43.64ms
500ms 523.09ms 98.45ms 48.17ms
750ms 528.64ms 97.59ms 46.83ms
1000ms 549.17ms 97.93ms 48.17ms
1500ms 564.24ms 286.55ms 270.62ms
2000ms 562.52ms 315.56ms 289.28ms
2500ms 564.03ms 624.43ms 527.14ms
3000ms 558.75ms 556.64ms 550.42ms
5000ms 563.99ms 561.57ms 566.26ms

Based on these results, we can see that the cache is cleared out after roughly 2.5 seconds after a single query is hit. A burst of queries will mean that more time is needed to clear the cache, but since the decay is exponential, it will not stay loaded for a very long time. (Roughly 100 queries within 1 second will mean the cache stays for 6-7 seconds). The tests used random queries on sift 1M index.