AmenRa / retriv

A Python Search Engine for Humans 🥸
MIT License
174 stars 20 forks source link

[Feature Request] Use WAND Top-K Retrieval #1

Closed hockyy closed 1 year ago

hockyy commented 1 year ago
@inproceedings{petri2013exploring,
  title={Exploring the magic of WAND},
  author={Petri, Matthias and Culpepper, J Shane and Moffat, Alistair},
  booktitle={Proceedings of the 18th Australasian Document Computing Symposium},
  pages={58--65},
  year={2013}
}

I believe if you're using inverted index and token - docs list, using the WAND Top-K Retrieval Algorithm can speedup retrieval for small K in large documents. I'm not sure whether it's relevant to this project. I've once implemented this https://raw.githubusercontent.com/hockyy/ir-pa-2/main/bsbi.py

AmenRa commented 1 year ago

Hi, thanks for the suggestion and code!

I have an implementation of another optimization algorithm for top-k retrieval on my local branch. Unfortunately, it slows down the retrieval because (I suspect) it requires more instructions to be executed (even if they are applied to less data). Current implementation heavily relies on vector computations, which are fairly optimized on modern CPUs.

I will let you know if WAND improves efficiency over the current implementation.

AmenRa commented 1 year ago

Hi, I have a working WAND implementation, but it is slower than brute force vector operations. I am now considering more advanced WAND-based approaches. I hope to add one soon.

AmenRa commented 1 year ago

Unfortunately, I don't think this will happen anytime soon. The lexical retriever is already reasonably efficient, and there are other things I prefer to prioritize.

I will close for now.