beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.49k stars 177 forks source link

adding a simple implementation of ColBERT #144

Open jjmachan opened 1 year ago

jjmachan commented 1 year ago

Firstly thank you for putting together this awesome repo 🙌🏽. I think I speak for every user here, you guys have made benchmarking of IR so much easier that even folks new to the field can get started fast.

I was playing with a bunch of benchmarks and wanted to run a ColBERT benchmark and found https://github.com/thakur-nandan/beir-ColBERT extremely useful. But this is a bit harder to setup and get running unlike the other models available via beir (I'm spoiled at this point...)

I was wondering if a simpler implementation like the one I found here https://github.com/sebastian-hofstaetter/neural-ranking-kd/blob/main/minimal_colbert_usage_example.ipynb to be much more beginner friendly and would able to run experiments faster.

I'd love to contribute to this myself since I'm playing with both implementations but before that, I wanted to know if this was something that would be useful

thanks again 🍻

zt991211 commented 1 year ago

Traceback (most recent call last): File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 58, in main() File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 25, in main args = parser.parse() File "/home/zhangtong/beir-ColBERT/colbert/utils/parser.py", line 110, in parse Run.init(args.rank, args.root, args.experiment, args.run) File "/home/zhangtong/beir-ColBERT/colbert/utils/runs.py", line 51, in init distributed.barrier(rank) File "/home/zhangtong/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier torch.distributed.barrier() File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8 Traceback (most recent call last): File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 58, in main() File "/home/zhangtong/beir-ColBERT/colbert/index.py", line 25, in main args = parser.parse() File "/home/zhangtong/beir-ColBERT/colbert/utils/parser.py", line 110, in parse Run.init(args.rank, args.root, args.experiment, args.run) File "/home/zhangtong/beir-ColBERT/colbert/utils/runs.py", line 51, in init distributed.barrier(rank) File "/home/zhangtong/beir-ColBERT/colbert/utils/distributed.py", line 25, in barrier torch.distributed.barrier() File "/home/zhangtong/anaconda3/envs/colbert-v0.2/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1710, in barrier work = _default_pg.barrier() RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1595629403081/work/torch/lib/c10d/ProcessGroupNCCL.cpp:32, unhandled cuda error, NCCL version 2.4.8

Do you encounter problems like this when you reproduce the project https://github.com/thakur-nandan/beir-ColBERT?

jjmachan commented 1 year ago

hey @zt991211 yeah I couldn't get it working as well because of some issues

zhiyuanpeng commented 1 year ago

Hi @jjmachan

I also find the original Colbert takes work to run. I would appreciate it if you could contribute to an easy-to-run version of Colbert. Thanks.

thakur-nandan commented 1 year ago

My patch of ColBERT here (https://github.com/thakur-nandan/beir-ColBERT) was an unofficial copy that I used to reproduce my experiments with ColBERT v1 model. Running ColBERT v1 requires a faiss GPU installation which is different from the faiss CPU installation. Make sure you use the conda faiss-gpu build (https://anaconda.org/conda-forge/faiss-gpu) and not the PyPI build of faiss.

I would be happy if anyone above can take the initiative to provide an easy-to-run ColBERT example. This will be useful for others to quickly play with ColBERT.

The original ColBERT authors have switched to the V2 version and have some jupyter notebooks for Quickstart. Maybe you can look into working with V2 version, if ColBERT V1 looks hard to debug and play around.

Thanks, Nandan

Hannibal046 commented 6 months ago

Hi, I write a simple version of ColBERT: https://github.com/Hannibal046/nanoColBERT, including training, indexing and end-2-end retrieval.

thakur-nandan commented 4 months ago

@Hannibal046 the nanoColBERT repo looks amazing and I'm sure it will be very useful for others to evaluate Colbert easily via BEIR. Could we add/patch a PR for the same?

Thanks, Nandan