Open Lbaiall opened 3 weeks ago
Hello, could you execute this for your conda environment and try again:
pip install setuptools==69.5.1
@yhshu No it still suck right here ,and i guess that is caseing by venv? i m not use the conda ,just local file python .venv ,or just something in my cuda version or my gpu hardware ? WARNING clustering 196 points to 128 centroids: please provide at least 4992 training points Clustering 196 points in 128D to 128 clusters, redo 1 times, 20 iterations Preprocessing in 0.00 s Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext; cudaStream_t = CUstream_st] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (196, 128) x (128, 128)' = (196, 128) gemm params m 128 n 196 k 128 trA T trB N lda 128 ldb 128 ldc 128
Do you install all packages with versions specified in requirements.txt
? Because any version difference could cause such an error.
it work! thanks ,but ....... the new erro ....... File "/home/ai/HippoRAG/src/colbertv2_indexing.py", line 41, in
Please always post all related commands you executed, so we can help you.
root@DESKTOP-9O20ND7:/home/ai/HippoRAG# bash 11.sh src/setup_hipporag_colbert.sh: line 9: python: command not found src/setup_hipporag_colbert.sh: line 10: python: command not found src/setup_hipporag_colbert.sh: line 13: python: command not found src/setup_hipporag_colbert.sh: line 16: python: command not found src/setup_hipporag_colbert.sh: line 17: python: command not found src/setup_hipporag_colbert.sh: line 19: python: command not found
[Jun 15, 20:49:51] #> Note: Output directory data/lm_vectors/colbert/sample/corpus/indexes/nbits_2 already exists
[Jun 15, 20:49:51] #> Will delete 10 files already at data/lm_vectors/colbert/sample/corpus/indexes/nbits_2 in 20 seconds...
nranks = 1 num_gpus = 1 device=0 { "query_token_id": "[unused0]", "doc_token_id": "[unused1]", "query_token": "[Q]", "doc_token": "[D]", "ncells": null, "centroid_score_threshold": null, "ndocs": null, "load_index_with_mmap": false, "index_path": null, "index_bsize": 64, "nbits": 2, "kmeans_niters": 20, "resume": false, "similarity": "cosine", "bsize": 64, "accumsteps": 1, "lr": 1e-5, "maxsteps": 400000, "save_every": null, "warmup": 20000, "warmup_bert": null, "relu": false, "nway": 64, "use_ib_negatives": true, "reranker": false, "distillation_alpha": 1.0, "ignore_scores": false, "model_name": null, "query_maxlen": 32, "attend_to_mask_tokens": false, "interaction": "colbert", "dim": 128, "doc_maxlen": 180, "mask_punctuation": true, "checkpoint": "exp\/colbertv2.0", "triples": "\/future\/u\/okhattab\/root\/unit\/experiments\/2021.10\/downstream.distillation.round2.2_score\/round2.nway6.cosine.ib\/examples.64.json", "collection": "data\/lm_vectors\/colbert\/sample_corpus_3.tsv", "queries": "\/future\/u\/okhattab\/data\/MSMARCO\/queries.train.tsv", "index_name": "nbits_2", "overwrite": false, "root": "data\/lm_vectors\/colbert\/sample", "experiment": "corpus", "index_root": null, "name": "2024-06\/15\/20.49.49", "rank": 0, "nranks": 1, "amp": true, "gpus": 1, "avoid_fork_if_possible": false } [Jun 15, 20:50:14] #> Loading collection... 0M [Jun 15, 20:50:17] [0] # of sampled PIDs = 3 sampled_pids[:3] = [1, 0, 2] [Jun 15, 20:50:17] [0] #> Encoding 3 passages.. [Jun 15, 20:50:19] [0] avg_doclen_est = 90.33333587646484 len(local_sample) = 3 [Jun 15, 20:50:19] [0] Creating 256 partitions. [Jun 15, 20:50:19] [0] Estimated 271 embeddings. [Jun 15, 20:50:19] [0] #> Saving the indexing plan to data/lm_vectors/colbert/sample/corpus/indexes/nbits_2/plan.json .. WARNING clustering 258 points to 256 centroids: please provide at least 9984 training points Clustering 258 points in 128D to 256 clusters, redo 1 times, 20 iterations Preprocessing in 0.00 s Iteration 19 (0.06 s, search 0.03 s): objective=0.0608023 imbalance=1.008 nsplit=0 [Jun 15, 20:50:20] Loading decompress_residuals_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... [Jun 15, 20:50:20] Loading packbits_cpp extension (set COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True for more info)... [0.024, 0.04, 0.04, 0.033, 0.039, 0.047, 0.022, 0.026, 0.037, 0.056, 0.035, 0.04, 0.041, 0.038, 0.017, 0.035, 0.036, 0.025, 0.044, 0.03, 0.038, 0.039, 0.025, 0.04, 0.03, 0.051, 0.022, 0.043, 0.057, 0.052, 0.036, 0.038, 0.039, 0.042, 0.036, 0.054, 0.017, 0.041, 0.036, 0.02, 0.018, 0.048, 0.046, 0.048, 0.026, 0.042, 0.043, 0.044, 0.031, 0.041, 0.038, 0.039, 0.034, 0.019, 0.028, 0.049, 0.044, 0.024, 0.046, 0.027, 0.019, 0.039, 0.026, 0.033, 0.032, 0.03, 0.05, 0.024, 0.021, 0.023, 0.044, 0.039, 0.037, 0.036, 0.041, 0.026, 0.048, 0.033, 0.034, 0.038, 0.034, 0.033, 0.039, 0.034, 0.044, 0.054, 0.038, 0.028, 0.051, 0.035, 0.037, 0.019, 0.029, 0.034, 0.033, 0.038, 0.024, 0.045, 0.033, 0.049, 0.059, 0.045, 0.023, 0.047, 0.047, 0.03, 0.042, 0.036, 0.023, 0.02, 0.015, 0.025, 0.042, 0.034, 0.029, 0.024, 0.033, 0.027, 0.041, 0.022, 0.02, 0.046, 0.044, 0.047, 0.025, 0.036, 0.025, 0.038] [Jun 15, 20:50:20] #> Got bucket_cutoffs_quantiles = tensor([0.2500, 0.5000, 0.7500], device='cuda:0') and bucket_weights_quantiles = tensor([0.1250, 0.3750, 0.6250, 0.8750], device='cuda:0') [Jun 15, 20:50:20] #> Got bucket_cutoffs = tensor([-2.2236e-02, -7.6294e-06, 2.2995e-02], device='cuda:0') and bucket_weights = tensor([-0.0479, -0.0089, 0.0083, 0.0513], device='cuda:0') [Jun 15, 20:50:20] avg_residual = 0.0355224609375 0it [00:00, ?it/s][Jun 15, 20:50:20] [0] #> Encoding 3 passages.. [Jun 15, 20:50:20] [0] #> Saving chunk 0: 3 passages and 271 embeddings. From #0 onward. 1it [00:00, 27.65it/s] [Jun 15, 20:50:20] [0] #> Checking all files were saved... [Jun 15, 20:50:20] [0] Found all files! [Jun 15, 20:50:20] [0] #> Building IVF... [Jun 15, 20:50:20] [0] #> Loading codes... 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1287.39it/s] [Jun 15, 20:50:20] [0] Sorting codes... [Jun 15, 20:50:20] [0] Getting unique codes... [Jun 15, 20:50:20] #> Optimizing IVF to store map from centroids to list of pids.. [Jun 15, 20:50:20] #> Building the emb2pid mapping.. [Jun 15, 20:50:20] len(emb2pid) = 271 100%|█████████████████████████████████████████████████████████████████████████████| 256/256 [00:00<00:00, 338079.92it/s] [Jun 15, 20:50:20] #> Saved optimized IVF to data/lm_vectors/colbert/sample/corpus/indexes/nbits_2/ivf.pid.pt [Jun 15, 20:50:20] [0] #> Saving the indexing metadata to data/lm_vectors/colbert/sample/corpus/indexes/nbits_2/metadata.json ..
Traceback (most recent call last):
File "/home/ai/HippoRAG/src/colbertv2_indexing.py", line 41, in
You definitely need to take care of the basic configuration first as I see python: command not found
You may close this issue if there is no other problem in this thread, thanks.
@yhshu when process step to right after #> Saving the indexing plan to colbert/indexes/nbits_2/plan.json .. it report that Number of training less than clusters,but i add more data in my sampledata file but it still away less for 12 point [Jun 20, 13:19:54] #> Loading collection... 0M [Jun 20, 13:19:57] [0] # of sampled PIDs = 12 sampled_pids[:3] = [6, 0, 4] [Jun 20, 13:19:57] [0] #> Encoding 12 passages.. [Jun 20, 13:19:58] [0] avg_doclen_est = 4.5 len(local_sample) = 12 [Jun 20, 13:19:58] [0] Creating 64 partitions. [Jun 20, 13:19:58] [0] Estimated 54 embeddings. [Jun 20, 13:19:58] [0] #> Saving the indexing plan to colbert/indexes/nbits_2/plan.json .. Process Process-2: Traceback (most recent call last): File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/colbert/infra/launcher.py", line 134, in setup_new_process return_val = callee(config, args) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 33, in encode encoder.run(shared_lists) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 68, in run self.train(shared_lists) # Trains centroids from selected passages File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 232, in train centroids = self._train_kmeans(sample, shared_lists) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 304, in _train_kmeans centroids = compute_faisskmeans(*args) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 507, in compute_faiss_kmeans kmeans.train(sample) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/faiss/init.py", line 1560, in train clus.train(x, self.index, weights) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/faiss/init.py", line 68, in replacement_train self.train_c(n, swig_ptr(x), index) File "/home/ai/HippoRAG/.venv/lib/python3.10/site-packages/faiss/swigfaiss.py", line 2328, in train return _swigfaiss.Clustering_train(self, n, x, index, x_weights) RuntimeError: Error in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t, const faiss::Index, faiss::Index&, const float*) at /project/faiss/faiss/Clustering.cpp:283: Error: 'nx >= k' failed: Number of training points (52) should be at least as large as number of clusters (64)
I think this is an environmental issue rather than the data size. Could you check if your conda environments meet the requirements.txt
first?
everything is all set,but when i get running index function,it get stuck in end commend line ,and my linux cuda version is 12.1 and my also pytorch version is 12.1,dose's anyone have the same erro? it seem like in Faiss-gpu erro
ner_gpt-3.5-turbo-1106_3 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 17050.02it/s] 0it [00:00, ?it/s] | 0/1 [00:00<?, ?it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14122.24it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 16256.99it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] 0it [00:00, ?it/s] /home/ai/HippoRAG/.venv/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice. return _methods._mean(a, axis=axis, dtype=dtype, /home/ai/HippoRAG/.venv/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide ret = ret.dtype.type(ret / rcount) OpenIE saved to output/openie_sample_results_ner_gpt-3.5-turbo-1106_3.json Passage NER already saved to output/sample_queries.named_entity_output.tsv 100%|██████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 10856.70it/s] Correct Wiki Format: 0 out of 3 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 13797.05it/s]
[Jun 14, 13:46:21] #> Note: Output directory colbert/indexes/nbits_2 already exists
[Jun 14, 13:46:21] #> Will delete 1 files already at colbert/indexes/nbits_2 in 20 seconds...
> Starting...
nranks = 1 num_gpus = 1 device=0 { "query_token_id": "[unused0]", "doc_token_id": "[unused1]", "query_token": "[Q]", "doc_token": "[D]", "ncells": null, "centroid_score_threshold": null, "ndocs": null, "load_index_with_mmap": false, "index_path": null, "index_bsize": 64, "nbits": 2, "kmeans_niters": 20, "resume": false, "similarity": "cosine", "bsize": 64, "accumsteps": 1, "lr": 1e-5, "maxsteps": 400000, "save_every": null, "warmup": 20000, "warmup_bert": null, "relu": false, "nway": 64, "use_ib_negatives": true, "reranker": false, "distillation_alpha": 1.0, "ignore_scores": false, "model_name": null, "query_maxlen": 32, "attend_to_mask_tokens": false, "interaction": "colbert", "dim": 128, "doc_maxlen": 180, "mask_punctuation": true, "checkpoint": "exp\/colbertv2.0", "triples": "\/future\/u\/okhattab\/root\/unit\/experiments\/2021.10\/downstream.distillation.round2.2_score\/round2.nway6.cosine.ib\/examples.64.json", "collection": "data\/lm_vectors\/colbert\/corpus.tsv", "queries": "\/future\/u\/okhattab\/data\/MSMARCO\/queries.train.tsv", "index_name": "nbits_2", "overwrite": false, "root": "", "experiment": "colbert", "index_root": null, "name": "2024-06\/14\/13.46.17", "rank": 0, "nranks": 1, "amp": true, "gpus": 1, "avoid_fork_if_possible": false } [Jun 14, 13:46:47] #> Loading collection... 0M [Jun 14, 13:46:50] [0] # of sampled PIDs = 29 sampled_pids[:3] = [13, 23, 0] [Jun 14, 13:46:50] [0] #> Encoding 29 passages.. [Jun 14, 13:46:51] [0] avg_doclen_est = 7.103448390960693 len(local_sample) = 29 [Jun 14, 13:46:51] [0] Creating 128 partitions. [Jun 14, 13:46:51] [0] Estimated 206 embeddings. [Jun 14, 13:46:51] [0] #> Saving the indexing plan to colbert/indexes/nbits_2/plan.json .. WARNING clustering 196 points to 128 centroids: please provide at least 4992 training points Clustering 196 points in 128D to 128 clusters, redo 1 times, 20 iterations Preprocessing in 0.00 s Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext; cudaStream_t = CUstream_st] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (196, 128) x (128, 128)' = (196, 128) gemm params m 128 n 196 k 128 trA T trB N lda 128 ldb 128 ldc 128