Closed hi-i-m-GTooth closed 5 months ago
Hi @hi-i-m-GTooth, your scripts look correct to me. Can you check if the generated queries look ok (data/xorqa_data/100k/xorqa_corpus.tsv.q10.docTquery
)?
Hi, Dr. Zhuang. Sorry for the late reply.
Below, I try to observe what QG model generated with --num_return_sequences 1
.
I take DOC 40693
for example.
Khalid bin Abdulaziz Al Saud ( ""; 13 February 1913 – 13 June 1982) was King of Saudi Arabia from 1975 to 1982. His reign saw both huge developments in the country due to increase in oil revenues and significant events in the Middle East. Khalid of Saudi Arabia
Samely, I generated following queries with models/xor-tydi-docTquery-mt5-large
.
{"text_id": 40693, "text": "من هو ملك السعودية ؟"}
{"text_id": 40693, "text": "কাতিফের বর্তমান রাষ্ট্রপতি কে?"}
{"text_id": 40693, "text": "Milloin Saud-Arabian kuningattaret olivat vallassa?"}
{"text_id": 40693, "text": "サウジ国王の初代王は誰?"}
{"text_id": 40693, "text": "칼리드 5세의 생일은 언젠가요?"}
{"text_id": 40693, "text": "Когда родился шейх Клуда бен Аблязия́з ал Сауд?"}
{"text_id": 40693, "text": "షేక్ బూదిద్దీన్ ఒబేర్ లాసా ఎప్పుడు మరణించాడు?"}
Which could be translated to:
{"text_id": 40693, "text": "Who is the king of Saudi Arabia?"}
{"text_id": 40693, "text": "Who is the current president of Katif?"}
{"text_id": 40693, "text": "When were the queens of Saudi Arabia in power?"}
{"text_id": 40693, "text": "Who was the first Saudi king?"}
{"text_id": 40693, "text": "When is Khalid V's birthday?"}
{"text_id": 40693, "text": "When was Sheikh Kludah bin Ablyaziaz al Saud born?"}
{"text_id": 40693, "text": "When did Sheikh Budiddeen Obair Lhasa die?"}
I noticed that some words are not presented in DOC 409693
, e.g. Katif
, queens
, Khalid V
, Sheikh Kludah
, and Sheikh Budiddeen Obair Lhasa
. (Since I am not familiar with those languages, I translated them with Google Translate.)
Are they quite different from your / expected generated queries? Thank you!
New words are expected as the QG model could introduce new relevant words or just hallucinate.
So seems your QG step is correct, then the issue might be in the training step. I feel it might be just the batch size is too small in your case, probably try to set --gradient_accumulation_steps
to 4?
Hi, Dr. Zhuang.
Thanks for your precious advice. I'll try to train with setting --gradient_accumulation_steps
to 4!
By the way, during the discussion, I've trained DSI-QG on MSMARCO-100K Dataset with same process. The result (as the following image shows) is normal, unlike the abovementioned issues. According to #10 , since Mr. gcalabria could reproduce the results, I don't think the ckpt is the problem.
I hope this information will help us to address this issue :)
Hi, Dr. Zhuang.
I've tried to train with setting --gradient_accumulation_steps
to 4.
Unfortunately, I still can't reproduce the experiment for XOR dataset.
If it is acceptable, may I request a docker file containing the environment and scripts for reproduction? I think this could be the most reachable way to fix this problem.
Thank you very much!
Hi @hi-i-m-GTooth, unfortunately I do not have an env container, but I dont think the problem comes from the env, there is no tricky env installation. could you share the training loss as well?
here are the training curves I got before:
so in my case, it needs around 100k steps to start learning something, I am not quite sure how wandb logs steps with gradient_accumulation_steps > 1, maybe just wait for a bit longer? also maybe try xorqa 10k to debug (small datasets thus faster).
Hi, Dr. Zhuang.
Here is my training loss (--gradient_accumulation_steps = 4
):
And this is the training loss when --gradient_accumulation_steps = 1
:
They are both just stuck at about 4. I'll try to wait a little bit more to see the result of --gradient_accumulation_steps = 4
.
10k dataset will be nice for me to reduce the cost of computation, thanks for the advice.
Hi, Dr. Zhuang.
I would like to inform you, thanks to your suggestion, the setting --gradient_accumulation_steps = 4
works.
However, you may notice the converge speed is much slower than yours.
Actually, I've trained the model for about 2 weeks with 2 nodes
+ --gradient_accumulation_steps = 4
setting.
This is out of my expectation since the 2*4 nodes should physically meet the original settings for XOR Dataset.
Good to see it worked! the convergence speed might be impacted by the sampled docs and generated queries. But I hope it can converge to a similar level.
Hi, Dr. Zhuang. Sorry to bother you again in a short period.
Here is my question: I tried to reproduce DSI-QG's HIT@1 on XOR QA 100k, but the results differ from the paper. Supposedly, the HIT@1 curve should be like Fig. 2 in the paper.
Limited to computation power, I trained models with *A6000 GPU (48G) 2. Since the cuda version is
12.2
, I installed Pytorch 2.1.0. The server OS is Ubuntu 22.04.3 LTS with 64-core CPU**. Below are the scripts I executed:1. Generate Queries with Given QG Model Checkpoints
With QG model ckpt provided by Dr. Zhuang, I downloaded it (xor-tydi-docTquery-mt5-large) and placed it in
models
dir. Then I run the below script just like step 2 from README:2. Train DSI-QG with Query-represented Corpus
After executing above script, I tried to train DSI-QG model. Here is the script refer to step 3 from README:
Questions: The Performance is Significantly Different from the Paper
Though I haven't gone through the whole training procedure, the HIT@1 and HIT@10 scores are both strange so far.
I check my hyperparameters to ensure they follow README. For sure, I didn't edit or modify any codes either. Also, I apply given ckpt to avoid training QG model by myself to acquire more stable queries for DSI-QG model.
Below are the HIT@1, HIT@10 logged by wandb.
Here is the HIT@1 curve in Fig. 2 of the paper:
Hope you give me some comments! Appreciate for your contribution!