Open Alaa-Ebshihy opened 1 year ago
[x] indexing implementation is done on the long eval data set
[x] experimented with querying to generate the average result and per query
Next:
[x] wrap up the code for querying and try to generalize it
[x] I need to test for the robust also which includes simulation for EvEE
Next:
[ ] do the same previous experiments with robust on actual RD prediction
[ ] make a list of experiments required: absolute vs relative RD ...
[ ] use the longeval data from gabreila to simulate evolving test collections or predict the results using transfer learning
[ ] create for 2023 collection also or any other we use the script, to create a mapping file between the urls and the docids: https://github.com/galuscakova/longeval/blob/be237848d01b8f4ba85a96a99465b68ba9d47bfd/convert-qwant-collection.py to get the time, we use the mine time script (this takes an input mapping file and the directory of the original collection): https://github.com/galuscakova/longeval/blob/be237848d01b8f4ba85a96a99465b68ba9d47bfd/mine-time.py
Implement code to use pyterrier to index documents, query and generate evalulation
Use the tutorial: https://github.com/terrier-org/ecir2021tutorial/tree/main
Other sources: https://cs.usm.maine.edu/~behrooz.mansouri/courses/Slides_IR_22/Introduction%20to%20Information%20Retrieval%20--%20Session%2013%20-%20Pyterrier%20Tutorial.pdf