facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.71k stars 300 forks source link

IndexError: Dimension out of range #213

Open mary-octavia opened 2 years ago

mary-octavia commented 2 years ago

I'm getting an IndexError when trying to obtain validation scores by running the dense_retriever.py script:

File "dense_retriever.py", line 545, in main
    questions_tensor = retriever.generate_question_vectors(questions, query_token=qa_src.special_query_token)
  File "dense_retriever.py", line 128, in generate_question_vectors
    selector=self.selector,
  File "dense_retriever.py", line 75, in generate_question_vectors
    max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
  File "dense_retriever.py", line 75, in <genexpr>
    max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

The command I use: python dense_retriever.py model_file=/data-ssd/osulea/DPR/dpr/downloads/checkpoint/retriever/single/nq/bert-base-encoder.cp qa_dataset=nq_test ctx_datatsets=[dpr_wiki] encoded_ctx_files=[\"/data-ssd/osulea/DPR/downloads/data/retriever_results/nq/single/wikipedia_passages_*\"] out_file=retr_inf.out

Hcnaeg commented 2 years ago

I got the same error, I guess the error reason has something to do with this annotation , and I walked around by "git reset --hard adbf1d9", in which commit the code that results this error doesn't exsit

StalVars commented 2 years ago

Use size(0) instead: max_vector_len = max(q_t.size(0) for q_t in batch_tensors) min_vector_len = min(q_t.size(0) for q_t in batch_tensors)

juyoung228 commented 2 years ago

@StalVars I solved the same problem. Thanks. Could you explain the reason it works?

mary-octavia commented 2 years ago

git reset --hard adbf1d9

That does not work for me, but at least I get a new error:

Traceback (most recent call last): File "dense_retriever.py", line 596, in main top_results_and_scores = retriever.get_top_docs(questions_tensor.numpy(), cfg.n_docs) File "dense_retriever.py", line 176, in get_top_docs results = self.index.search_knn(query_vectors, top_docs) File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in search_knn db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] IndexError: list index out of range

PlusRoss commented 2 years ago

@mary-octavia I met the same error. It turns out I specified the wrong index files so the index was not loaded correctly.

gaishun commented 2 years ago

I met the same error. Has this bug been fixed ? (or how to fix it.)

ZiluLii commented 2 years ago

Hi, I'm wondering if you solve this problem? Thank you:)

I tried Use size(0) but it didn't work for me

ZiluLii commented 2 years ago

@mary-octavia I met the same error. It turns out I specified the wrong index files so the index was not loaded correctly.

@PlusRoss Hi there, could you clarify a bit what you did to solve this? Thank you!:)

hyeonseokk commented 2 years ago

Hi, I found that "max_vector_len" and "min_vector_len" are temporary variables that adopted to filter out tensor size mismatch error.

As its support, recent code appended the following comments: " # TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline "

So.. I tried to remove the following codes, then.. it works.

            # TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline
            # max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
            # min_vector_len = min(q_t.size(1) for q_t in batch_tensors)
            #
            # if max_vector_len != min_vector_len:
            #     # TODO: _pad_to_len move to utils
            #     from dpr.models.reader import _pad_to_len
            #     batch_tensors = [_pad_to_len(q.squeeze(0), 0, max_vector_len) for q in batch_tensors]
AvivaTang commented 1 year ago

@mary-octavia Hi, I got the same error. Have you found a solution? I tried all the above solutions but nothing worked for me.

shyyyds commented 1 year ago

Hi, I found that "max_vector_len" and "min_vector_len" are temporary variables that adopted to filter out tensor size mismatch error.

As its support, recent code appended the following comments: " # TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline "

So.. I tried to remove the following codes, then.. it works.

            # TODO: this only works for Wav2vec pipeline but will crash the regular text pipeline
            # max_vector_len = max(q_t.size(1) for q_t in batch_tensors)
            # min_vector_len = min(q_t.size(1) for q_t in batch_tensors)
            #
            # if max_vector_len != min_vector_len:
            #     # TODO: _pad_to_len move to utils
            #     from dpr.models.reader import _pad_to_len
            #     batch_tensors = [_pad_to_len(q.squeeze(0), 0, max_vector_len) for q in batch_tensors]

it works!!!!

CodingPeasantzgl commented 1 year ago

git reset --hard adbf1d9

That does not work for me, but at least I get a new error:

Traceback (most recent call last): File "dense_retriever.py", line 596, in main top_results_and_scores = retriever.get_top_docs(questions_tensor.numpy(), cfg.n_docs) File "dense_retriever.py", line 176, in get_top_docs results = self.index.search_knn(query_vectors, top_docs) File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in search_knn db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] File "/data-ssd/osulea/DPR/dpr/indexer/faiss_indexers.py", line 110, in db_ids = [[self.index_id_to_db_id[i] for i in query_top_idxs] for query_top_idxs in indexes] IndexError: list index out of range

Has this bug been fixed ?