Closed cylinbao closed 1 month ago
We are preparing the preprocessed dataset (around 30+GB, will take a lot of time for uploading to HF).
Currently, we do not have a plan to upload index fiiles as they are too large and each retriever has its own index.
We will notify you asap when the data files have been uploaded.
Best, Yutao
@cylinbao hi, the wiki dataset we used for experiments have been uploaded (https://huggingface.co/datasets/ignore/FlashRAG_datasets/tree/main/retrieval-corpus).
@cylinbao I just conducted some testing experiments and found that due to modifying some settings in subsequent development, the llama3 model may not be able to provide accurate answers (affecting actual match).
I have returned this part of the settings to the original settings (see 1e94f2633bceb74119e12ca9cea40896086b899f) and obtained normal test results.
Sorry for making mistakes. Hope it can be helpful to you.
So what's the major difference between now and then, I have tried replug and only got 0.13 em score
@BUAADreamer There are two difference:
<|eot_id|>
into eos_token_id
and stop
in generation to let llama3 stop normally (see eda0c6a916d23405e2666b24477fcf26819996e6)Thanks for the replying. I understand this is an actively project so modifications are normal. But I wonder do you plan to release the accuracy numbers with the new changes?
@cylinbao Thanks for your suggestion! In the subsequent changes that may affect the results, we will re run the results of methods to evaluate how much the effect is affected.
The results we are currently reporting are actually based on the above improvements. The above improvements were present during our experiment, but were later removed by me due to some code reasons. Now it's just a rollback.
I wonder do you have the preprocessed wiki dump and index that are used in your experiments available?
I tried to test FlashRAG with my wiki dump from wiki_dpr and index with contriever. However, the accuracy I got from some of your baselines(zero-shot, naive, iter-gen) are lower than the reported numbers. I suspect it might be the misalignment of the data source and index. Therefore, it will be really helpful if you can provide the preprocessed wiki dump and index. I know we can replicate by following this, but it seems taking a long time to run.