Open DelaramRajaei opened 1 year ago
Thank you Delaram. I've read through the docs you've linked and have received your Teams message regarding next steps.
I've cloned the repair repo and am testing the commands listed there on some preexisting datasets. The plan is to then download the robust dataset, put it in the repo, and change the path to index that one
I am having quite a few issues installing Pyserini on my M1 Mac. I remember the last project I did with the Fani Lab I was running into similar issues. Notably, I seem to be unable to install nmslib, which is a dependency required for Pyserini. I will continue to debug this issue, but I was wondering if you knew anybody at the lab with an M1 Mac that has had similar issues and has resolved them
I've gotten it installed. That was annoying and I regret buying an M1 Mac lol... but on to the next
Perfect. Also, I remember having a similar issue I mentioned it in this link. If you encounter other problems this issue may help you.
Thanks! I've abandoned attempts at running it locally but have been able to push forward using Colab. However, when I try to download the robust dataset, I am faced with a licensing dilemma. Do you have any idea about this?
I know part of the error is omitted due to the scrolling, so I've pasted it here. If you want, I can try a different dataset or maybe you can spot something I did incorrectly.
[INFO] Please confirm you agree to the TREC data usage agreement found at https://trec.nist.gov/data/cd45/index.html [INFO] The TREC Robust document collection is from TREC disks 4 and 5. Due to the copyrighted nature of the documents, this collection is for research use only, which requires agreements to be filed with NIST. See details here: https://trec.nist.gov/data/cd45/index.html. More details about the procedure can be found here: https://ir-datasets.com/trec-robust04.html#DataAccess. Once completed, place the uncompressed source here: /root/.ir_datasets/disks45/corpus This should contain directories like NEWS_data/FBIS, NEWS_data/FR94, etc.
It seems that I can obtain a license somehow. However, it also mentions that there may be an organizational license already obtained by the organization. I'm wondering if we have one?
I'm obtaining this info by reading this page: https://ir-datasets.com/trec-robust04.html#DataAccess
I have been given copies of the datasets and am working on running the commands now.
@hosseinfani Hi Professor, I am running the encoding command on a Google Colab session and it's running, which is great. However, it's moving really slow and I believe I still have an indexing command to run after which will take a long time. I was told you could provide me with access to Compute Canada... any chance you could help me out with that? Unfortunately, I have been unable to run these local commands as I have encountered arm processor dependency hell
@michelecatani Hi Mike, sure. please follow the steps here: https://github.com/fani-lab/Library/blob/main/ComputeCanada.md
@hosseinfani Thank you!
I have provided Delaram with the encoded file and faiss file for robust04. I await her confirmation to see if everything's correct, then I will work on indexing as many of the other datasets as I can over the weekend.
@michelecatani
This is an issue page to log your progress.
Please read the IR and backtranslation document by Monday. The document is not fully completed in the metric part. You can find some helpful information in this link.
Please let us know if you have any concerns or questions.