facebookresearch / DPR

Dense Passage Retriever - is a set of tools and models for open domain Q&A task.
Other
1.73k stars 304 forks source link

omegaconf.errors.ConfigKeyError: Key 'p' is not in struct #167

Closed yolsever closed 3 years ago

yolsever commented 3 years ago

Dear Vlad,

When I am trying to validate the retriever against the entire set of documents, I get the following error. For context, for ctx_datatsets, I am using the same input as the ctx_src for the generate_dense_embeddings.py script. So, it is a CSV file with columns in the order as in ['id','text','title']. Thank you!

[2021-06-23 11:16:01,725][root][INFO] - Encoded queries 3200 [2021-06-23 11:16:05,060][root][INFO] - Total encoded queries tensor torch.Size([3610, 768]) Error executing job with overrides: ['model_file=/shared_folder/shared_notebooks/dala/experiments/DPR/outputs/2021-06-18/15-43-23/outputs/dpr_biencoder.29', 'qa_dataset=nq_test', 'ctx_datatsets=pm_dev', 'encoded_ctx_files=/shared_folder/shared_notebooks/dala/experiments/DPR/outputs/_0', 'out_file=/shared_folder/shared_notebooks/dala/experiments/DPR/outputs/'] Traceback (most recent call last): File "dense_retriever.py", line 366, in main ctx_src = hydra.utils.instantiate(cfg.ctx_sources[ctx_src]) omegaconf.errors.ConfigKeyError: Key 'p' is not in struct full_key: ctx_sources.p object_type=dict

Also, if I feed ctx_datatsets=[pm_dev] instead of ctx_datatsets=pm_dev. I get the following error:

2021-06-23 11:35:22,542][root][INFO] - Total encoded queries tensor torch.Size([3610, 768]) [2021-06-23 11:35:22,547][root][INFO] - id_prefixes per dataset: ['pm:'] [2021-06-23 11:35:22,547][root][INFO] - ctx_files_patterns: /shared_folder/shared_notebooks/dala/experiments/DPR/outputs/_0 Error executing job with overrides: ['model_file=/shared_folder/shared_notebooks/dala/experiments/DPR/outputs/2021-06-18/15-43-23/outputs/dpr_biencoder.29', 'qa_dataset=nq_test', 'ctx_datatsets=[pm_dev]', 'encoded_ctx_files=/shared_folder/shared_notebooks/dala/experiments/DPR/outputs/_0', 'out_file=/shared_folder/shared_notebooks/dala/experiments/DPR/outputs/'] Traceback (most recent call last): File "dense_retriever.py", line 380, in main ), "ctx len={} pref leb={}".format(len(ctx_files_patterns), len(id_prefixes)) AssertionError: ctx len=63 pref leb=1

Best regards, Kaan

juyoung228 commented 2 years ago

Hi, there. Did you solve this problem? I'm getting through the same thing.

ZiluLii commented 2 years ago

Hi, I'm wondering if you solve the problem? I'm also getting the same thing:(

shahad2099 commented 1 year ago

Hello , did you solve this problem ? I getting the same issue when i'm trying to validate retriever with my own dataset :(

zekun-li commented 1 year ago

I encountered the same issue. Seems that the reason is the ctx_datatsets was not configured correctly. If you're running nq with dpr_wiki, then here is the command that worked for me. The trick is that ctx_datatsets needs to be [dpr_wiki] (with brackets)

python dense_retriever.py \
    model_file="/home/zekun/mapqa/DPR/downloads/checkpoint/retriever/single/nq/bert-base-encoder.cp" \
    qa_dataset=nq_test \
    ctx_datatsets=[dpr_wiki] \
    encoded_ctx_files=["/home/zekun/mapqa/DPR/downloads/data/retriever_results/nq/single/wikipedia_passages_*.pkl"] \
    out_file="out_json.json"