Describe the bug
I am running step 3 on one 80G A100 GPU to "Build index for similarity search". My "DATA_BLEND" is the first 10000 scraped text items from openwebtext created with the steps. I only want to build an index using these 10000 text items. But when I run bash tools/retro/examples/preprocess_data.sh index-train, I encountered the following error:
Traceback (most recent call last):
File "tools/retro/main.py", line 224, in <module>
train_index() # train only
File "/n/home06/zhentingqi/LLM_safety/Megatron-LM-retro/./tools/retro/index/build.py", line 137, in train_index
train_on_embeddings()
File "/n/home06/zhentingqi/LLM_safety/Megatron-LM-retro/./tools/retro/index/build.py", line 112, in train_on_embeddings
index.train()
File "/n/home06/zhentingqi/LLM_safety/Megatron-LM-retro/./tools/retro/index/indexes/faiss_base.py", line 81, in train
self._train()
File "/n/home06/zhentingqi/LLM_safety/Megatron-LM-retro/./tools/retro/index/indexes/faiss_base.py", line 71, in _train
index.train(inp)
File "/n/home06/zhentingqi/.local/lib/python3.8/site-packages/faiss/__init__.py", line 280, in replacement_train
self.train_c(n, swig_ptr(x))
File "/n/home06/zhentingqi/.local/lib/python3.8/site-packages/faiss/swigfaiss.py", line 3605, in train
return _swigfaiss.IndexPreTransform_train(self, n, x)
RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /project/faiss/faiss/Clustering.cpp:283: Error: 'nx >= k' failed: Number of training points (4850) should be at least as large as number of clusters (65536)
Why is the number of training points 4850? And how can I reset the limit 65536 to fit my number of training points? Thanks!
Environment (please complete the following information):
Describe the bug I am running step 3 on one 80G A100 GPU to "Build index for similarity search". My "DATA_BLEND" is the first 10000 scraped text items from openwebtext created with the steps. I only want to build an index using these 10000 text items. But when I run
bash tools/retro/examples/preprocess_data.sh index-train
, I encountered the following error:Why is the number of training points 4850? And how can I reset the limit 65536 to fit my number of training points? Thanks!
Environment (please complete the following information):
Script:
preprocess_data.sh
: