Closed MXueguang closed 3 years ago
FYI, change the OMP_NUM_THREADS
variable
Ubuntu 20.04.1 LTS (Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz) 56 cores):
6980/6980 [11:52<00:00, 9.79it/s] (run with 56 cores)
6980/6980 [06:35<00:00, 17.66it/s] (run with 12 cores)
6980/6980 [06:14<00:00, 18.65it/s] (run with 8 cores)
maybe we want to replicate HNSW by searching with batches? i.e.
python -m pyserini.dsearch --topics msmarco_passage_dev_subset \
--index msmarco-passage-tct_colbert-hnsw \
--encoded-queries msmarco-passage-dev-subset-tct_colbert \
--batch 12 \
--output runs/run.msmarco-passage.tct_colbert.hnsw.tsv \
--msmarco
582/582 [01:01<00:00, 9.45it/s]
MRR @10: 0.33395142584254184
it will make things faster, as we did on brute force index.
Yes, we should carefully analyze the effects of intra-query parallelism vs. inter-query parallelism.
The former is splitting up a single query across multiple threads. The latter is batching.
Way better to set a large batch size
cmd:
export OMP_NUM_THREADS=56
python -m pyserini.dsearch --topics msmarco_passage_dev_subset \
--index msmarco-passage-tct_colbert-hnsw \
--encoded-queries msmarco-passage-dev-subset-tct_colbert \
--batch 56 \
--output runs/run.msmarco-passage.tct_colbert.hnsw.tsv \
--msmarco
Ubuntu 20.04.1 LTS (Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz) 56 cores):
125/125 [01:12<00:00, 1.73it/s]
Which means we should have separate --threads
and --batch
options?
Which means we should have separate
--threads
and--batch
options?
I think so. I'll do that
Brute force replication for the record:
cmd:
python -m pyserini.dsearch --topics msmarco_passage_dev_subset \
--index msmarco-passage-tct_colbert-bf \
--encoded-queries msmarco-passage-dev-subset-tct_colbert \
--batch 56 \
--output runs/run.msmarco-passage.tct_colbert.bf.tsv \
--msmarco
Ubuntu 20.04.1 LTS (Intel(R) Xeon(R) Gold 5120 CPU @ 2.20GHz) 56 cores):
125/125 [45:00<00:00, 21.61s/it]
MRR @10: 0.33444603629417247
I ran different batch sizes on my iMac Pro:
batch = 24: 291/291 [07:58<00:00, 1.65s/it]
batch = 36: 194/194 [06:20<00:00, 1.96s/it]
batch = 48: 146/146 [05:13<00:00, 2.15s/it]
batch = 60: 117/117 [05:00<00:00, 2.57s/it]
batch = 72: 97/97 [04:20<00:00, 2.69s/it]
batch = 84: 84/84 [04:14<00:00, 3.03s/it]
batch = 96: 73/73 [04:16<00:00, 3.51s/it]
batch = 108: 65/65 [03:55<00:00, 3.62s/it]
batch = 120: 59/59 [03:52<00:00, 3.94s/it]
Seems to like big batches...
I ran different batch sizes on my iMac Pro: Seems to like big batches...
with fix threads num?
Yes, I didn't specify --threads
, so whatever the default is. I ran this on the original PR.
HNSW index (single query):
As discussed in https://github.com/castorini/pyserini/pull/292# Replication from @lintool
Replication from @justram
Replication from @MXueguang
It seems multithreading on single query for hnsw doesn't improve the efficiency.