Open ParishadBehnam opened 1 year ago
Hello again @AkariAsai ,
Since I didn't get any responses, I tried to run cross-task retrieval from scratch. However, I don't get the same results as Table 4 (last row) of the paper! Could you please correct me if I am using an incorrect argument for the following steps?
Thank you :)
Embedding passages (do for all corpora in corss_task_cross_domain_final
):
python generate_passage_embeddings.py \ --model_name_or_path facebook/tart-full-flan-t5-xl \ --output_dir embeddings/linkso \ --passages data/corss_task_cross_domain_final/linkso_py/corpus.jsonl \ --shard_id 0 --num_shards 1
Running cross-task:
python eval_cross_task.py \ --passages data/corss_task_cross_domain_final/nq/corpus.jsonl data/corss_task_cross_domain_final/scifact/corpus.jsonl data/corss_task_cross_domain_final/gooaq_med/corpus.jsonl data/corss_task_cross_domain_final/linkso_py/corpus.jsonl data/corss_task_cross_domain_final/ambig/corpus.jsonl data/corss_task_cross_domain_final/wikiqa/corpus.jsonl data/corss_task_cross_domain_final/gooaq_technical/corpus.jsonl data/corss_task_cross_domain_final/codesearch_py/corpus_new.jsonl \ --passagesembeddings "embeddings/linkso/passages" "embeddings/ambig/passages_" "embeddings/scifact/passages*" "embeddings/nq/passages" "embeddings/gooaq/passages_" "embeddings/codesearch/passages*" "embeddings/wikiqa/passages*" \ --qrels data/corss_task_cross_domain_final/linkso/qrels/test_new.tsv \ --output_dir logs/linkso_results \ --model_name_or_path facebook/tart-full-flan-t5-xl \ --projection_size 1024
Hello ParishadBehnam, I did reproduce the results of last row of Table 4 (X2 setup) using following arguments (although some results are different). As I understand, you should retrieve first stage result using Retriever (Contriever-MSMARCO),which is used for generating embedding, and then use 'tart-full-flan-t5-xl' as a reranker for the second stage.
CKPT=./ckpt/tart-dual-contriever-msmarco
CE_CKPT=facebook/tart-full-flan-t5-xl
python generate_passage_embeddings.py --model_name_or_path $CKPT --output_dir ${OUTPUT_DIR_NAME}/embeddings/${DATA} \
--passages ../../../data/corss_task_cross_domain_final/${DATA}/corpus.jsonl --shard_id ${i} --num_shards 8
python eval_cross_task.py \
--passages ../../../data/corss_task_cross_domain_final/nq/corpus.jsonl ../../../data/corss_task_cross_domain_final/scifact/corpus.jsonl ../../../data/corss_task_cross_domain_final/linkso_py/corpus.jsonl ../../../data/corss_task_cross_domain_final/ambig/corpus.jsonl ../../../data/corss_task_cross_domain_final/wikiqa/corpus.jsonl ../../../data/corss_task_cross_domain_final/gooaq_technical/corpus.jsonl ../../../data/corss_task_cross_domain_final/codesearch_py/corpus.jsonl \
--passages_embeddings "${OUTPUT_DIR_NAME}/embeddings/nq/passages_*" "${OUTPUT_DIR_NAME}/embeddings/scifact/passages_*" "${OUTPUT_DIR_NAME}/embeddings/linkso_py/passages_*" "${OUTPUT_DIR_NAME}/embeddings/ambig/passages_*" "${OUTPUT_DIR_NAME}/embeddings/wikiqa/passages_*" "${OUTPUT_DIR_NAME}/embeddings/gooaq_technical/passages_*" "${OUTPUT_DIR_NAME}/embeddings/codesearch_py/passages_*" \
--qrels ../../../data/corss_task_cross_domain_final/${DATA}/qrels/test.tsv \
--output_dir ${OUTPUT_DIR_NAME}/pooled-${DATA} \
--model_name_or_path $CKPT \
--data ../../../data/corss_task_cross_domain_final/${DATA}/queries.jsonl \
--prompt "${PROMPT}" \
--ce_model $CE_CKPT \
--ce_prompt "${PROMPT}"
Dear Akari, Thank you for the great work and the detailed documentation on TART. I want to reproduce the cross-task cross-domain results. You said you have uploaded all the passages embeddings on Google drive. However, I only find the embeddings for Arguana, Climate-Fever, DBPedia, NQ, SciDocs (this dir is empty), Tourches, and Trec-Covid. I am looking for passage embeddings of AmbigQA, WikiQA, SciFact, GooAQ-Technical, LinkSO-Python, and CodeSearchNet-Python for cross-task retrieval. Can you please provide me with them?
Thank you :)