castorini / anserini

Anserini is a Lucene toolkit for reproducible information retrieval research
http://anserini.io/
Apache License 2.0
1.03k stars 457 forks source link

Add reproduction log #2544

Closed MehrnazSadeghieh closed 3 months ago

MehrnazSadeghieh commented 3 months ago

I reproduced results on macOS using MacBook terminal. Everything worked successfully, except for using curl instead of wget for downloading the dataset.

Setup details:

Here is my device's information:

MehrnazSadeghieh commented 3 months ago

Dear Dr. Lin,

Thank you very much for your comprehensive explanation in your guide.

For running the mentioned commands, I used the MacBook terminal again. As I mentioned previously, my operating system is macOS.

To be honest, until now, I had only worked with Pandas and Elasticsearch for my academic projects in data mining and information retrieval. It is really amazing for me to become familiar with a new toolkit. By following your guidelines, everything was great so far, and I hope to complete the whole onboarding path without any problem.

Through the BM25 Baselines for MS MARCO Passage Ranking in Anserini, I realized I had not installed Anserini previously. So, as you mentioned, I used your guide for the installation process. However, I faced a little problem. During the execution of the command: java -cp anserini-0.36.1-fatjar.jar io.anserini.search.SearchCollection -index msmarco-v1-passage.splade-pp-ed -topics msmarco-v1-passage.dev -encoder SpladePlusPlusEnsembleDistil -output run.msmarco-v1-passage-dev.splade-pp-ed-onnx.txt -impact -pretokenized My connection got interrupted, and the file did not continue downloading. So, as usual, I entered Control+C to stop downloading the file. But when I wanted to run the command again, I faced a challenge. Because I was not successful in downloading the index file completely, I got the checksum error. I knew the error was because I did not complete my download, but I could not find the index file to delete it and redownload it. It was a bit challenging for me. All the commands and scripts I used did not help me to find that file.

After a lot of searching on the web or with ChatGPT and failing, I suddenly figured out that you linked the detailed instructions in the installation guide. I don't know why I did not see that sooner, and I am sorry for that. But after all, by checking the detailed instruction page, I figured out what to do. I just wanted to thank you for your guides and share my challenge as you asked.

Also I wanted to suggest that it might be helpful to include a note in the main guide about handling incomplete downloads or checksum errors, and where to find the index files to delete and redownload if needed.

Thank you very much for checking our process. I really appreciate your time and consideration.

Best regards,

lintool commented 3 months ago

hi @MehrnazSadeghieh please fix conflicts here.

lintool commented 3 months ago

https://github.com/castorini/anserini/discussions/2428

MehrnazSadeghieh commented 3 months ago

2428

Hi Dr. Lin,

I have tried very hard to fix this issue, but the approaches I have taken did not go well. To be honest, at first, I thought the problem was because I pushed my changes to a new branch. So, I tried this approach on another repository to check if pushing to the master branch would fix the problem. I tried it, and it was okay. Then, I decided to push my changes to the master branch in this repository (on this pull request: https://github.com/castorini/anserini/pull/2549), but I still get the same error that you mentioned. Can you please guide me on what else I can do to resolve this problem?

I use the following command to get my commit ID: git log -1 --format="%H"

MehrnazSadeghieh commented 3 months ago

2428

Hi Dr. Lin,

I have tried very hard to fix this issue, but the approaches I have taken did not go well. To be honest, at first, I thought the problem was because I pushed my changes to a new branch. So, I tried this approach on another repository to check if pushing to the master branch would fix the problem. I tried it, and it was okay. Then, I decided to push my changes to the master branch in this repository (on this pull request: #2549), but I still get the same error that you mentioned. Can you please guide me on what else I can do to resolve this problem?

I use the following command to get my commit ID: git log -1 --format="%H"

hi Dr. Lin i think i fixed this issue on the new pull request i have mentioned in this quote. please let me know if there is still a problem. I guess the problem was that i was updating my commit ids any time i pushed to a branch and that id did not point to the main branch of repository and it was pointing to my own commits and that was why the error happened.

MehrnazSadeghieh commented 3 months ago

2428

Hi Dr. Lin, I have tried very hard to fix this issue, but the approaches I have taken did not go well. To be honest, at first, I thought the problem was because I pushed my changes to a new branch. So, I tried this approach on another repository to check if pushing to the master branch would fix the problem. I tried it, and it was okay. Then, I decided to push my changes to the master branch in this repository (on this pull request: #2549), but I still get the same error that you mentioned. Can you please guide me on what else I can do to resolve this problem? I use the following command to get my commit ID: git log -1 --format="%H"

hi Dr. Lin i think i fixed this issue on the new pull request i have mentioned in this quote. please let me know if there is still a problem. I guess the problem was that i was updating my commit ids any time i pushed to a branch and that id did not point to the main branch of repository and it was pointing to my own commits and that was why the error happened.

I just commited my changes in this branch too for your better access because as i realized the problem was not that i was thinking before.

Thanks a lot again and I apologize for my confusion