beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.49k stars 177 forks source link

evaluate_sbert_multi_gpu - metrics.compute() unable to read cache file #134

Open ashokrajab opened 1 year ago

ashokrajab commented 1 year ago

I'm trying to run beir/examples/retrieval/evaluation/dense/evaluate_sbert_multi_gpu.py. Doing do I end up with the below error.

Traceback (most recent call last): File "evaluate_sbert_multi_gpu.py", line 62, in results = retriever.retrieve(corpus, queries) File "/data/user/beir/beir/retrieval/evaluation.py", line 23, in retrieve return self.retriever.search(corpus, queries, self.top_k, self.score_function, kwargs) File "/data/user/beir/beir/retrieval/search/dense/exact_search_multi_gpu.py", line 150, in search cos_scores_top_k_values, cos_scores_top_k_idx, chunk_ids = metric.compute() File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/evaluate/module.py", line 433, in compute self._finalize() File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/evaluate/module.py", line 390, in _finalize self.data = Dataset(reader.read_files([{"filename": f} for f in file_paths])) File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 260, in read_files pa_table = self._read_files(files, in_memory=in_memory) File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 195, in _read_files pa_table: Table = self._get_table_from_filename(f_dict, in_memory=in_memory) File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 331, in _get_table_from_filename table = ArrowReader.read_table(filename, in_memory=in_memory) File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/arrow_reader.py", line 352, in read_table return table_cls.from_file(filename) File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/table.py", line 1065, in from_file table = _memory_mapped_arrow_table_from_file(filename) File "/home/user/miniconda3/envs/beir/lib/python3.7/site-packages/datasets/table.py", line 52, in _memory_mapped_arrow_table_from_file pa_table = opened_stream.read_all() File "pyarrow/ipc.pxi", line 750, in pyarrow.lib.RecordBatchReader.read_all File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: Expected to be able to read 80088040 bytes for message body, got 80088032

-- command used: CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python evaluate_sbert_multi_gpu.py

@thakur-nandan Any idea how to proceed?

thakur-nandan commented 1 year ago

The reason for this error is insufficient host memory (CPU ram). I would suggest evaluating on a larger GPU cluster or try reducing the batch size.