Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
156 stars 28 forks source link

About documents.tsv #34

Closed FDrAe86 closed 3 years ago

FDrAe86 commented 3 years ago

About running this instruction:

Indri index

awk '{print $3}' data/robust/*.run | python extract_docs_from_index.py indri PATH_TO_INDRI_INDEX > data/robust/documents.tsv I got a problem. What's PATH_TO_INDRI_INDEX ?,should I modify any code in extract_docs_from_index.py? I run this py file and the error is

the following arguments are required: index_type, index_path

Thanks for your great work!!

seanmacavaney commented 3 years ago

PATH_TO_INDRI_INDEX is the the path to an indri index for the target collection. Since the document content is stored when indexing, you can use this to get a copy of the text for use by the models. You shouldn't need to modify extract_docs_from_index.py -- you just need an index built.

FDrAe86 commented 3 years ago

OK,I'll try it. I appreciate your response,Thank youI!!

wangxinzhe123 commented 2 years ago

Has your problem been solved? I have the same problem as you. Could you please reply me at your convenience?