DevSinghSachan / emdr2

Code and Models for the paper "End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering" (NeurIPS 2021)
Other
107 stars 10 forks source link

question about evidence embedding file #11

Open ISSCA-ZED opened 1 year ago

ISSCA-ZED commented 1 year ago

the precomputed evidence embedding file is only 19GB if I download it by Google,and then I have a error message

Unpickling BlockData: /disk2/qby/Desktop/emdr2-main/embedding-path/emdr2-finetuning-embedding/psgs_w100-retriever-nq-emdr2-finetuning-base-topk50-epochs10-bsize64-async-indexer.pkl Traceback (most recent call last): File "tasks/run.py", line 67, in main() File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 72, in main open_retrieval_generative_qa(dataset_cls) File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 60, in open_retrieval_generative_qa end_of_training_callback_provider=distributed_metrics_func_provider) File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/train_e2eqa.py", line 583, in train model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider) File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 134, in setup_model_and_optimizer model = get_model(model_provider_func) File "/disk2/qby/Desktop/emdr2-main/megatron/training.py", line 43, in get_model model = model_provider_func() File "/disk2/qby/Desktop/emdr2-main/tasks/openqa/e2eqa/run.py", line 36, in model_provider evidence_retriever = PreComputedEvidenceDocsRetriever() File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 387, in init self.precomputed_index_wrapper() File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 417, in precomputed_index_wrapper self.get_evidence_embedding(args.embedding_path) File "/disk2/qby/Desktop/emdr2-main/megatron/model/emdr2_model.py", line 412, in get_evidence_embedding load_from_path=True) File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 28, in init self.load_from_file() File "/disk2/qby/Desktop/emdr2-main/megatron/data/emdr2_index.py", line 50, in load_from_file state_dict = pickle.load(open(self.embedding_path, 'rb')) _pickle.UnpicklingError: pickle data was truncated

DevSinghSachan commented 1 year ago

Can you try to use the dropbox link to download? The actual size would be ~32 GB.