This issue isn't a problem to be fixed, rather it is a record to keep track of undergraduate student's work on TREC 2024. In a series of comments, contributors can write about their work (at a high level) on specific tracks. This issue will be regularly updated upon finishing certain tasks/tracks.
Assisted in creation of a f-stage, gte-qwen2 dense retrieval baseline. Assisted in writing scripts for encoding corpus and queries, and then running retrieval with aforementioned embeddings.
Created BM25 document translation and query translation baselines. Indexed corpus, ran baselines, and evaluated all created runs.
Created a SPLADE document translated baseline. Encoded the corpus and queries using splade model, indexed corpus embeddings, ran the baseline, and evaluated results. This baseline was not used in submissions, given it's surprisingly low eval scores.
Created and ran several data munging scripts to reformat queries and corpus.
Task: Multilingual Retrieval (MLIR)
Created and used multiple reformatting scripts, the most notable one being converting a trec run into retrieval results format.
Created SPLADE, BM25-dt, BM25-qt, and PLAID baselines for task (basically all f-stage baselines). Ran all baselines, evaluated runs, RRFed runs into fusion runs, and then evaluated fusion runs. Fusion runs were sent off to mono stage.
Fused (with RRF) and evaluated post-mono and post-listo runs.
Task: Cross-language Retrieval (CLIR)
Taking post-mono fused runs from MLIR, I combined the zho, rus, and fas runs to create a top 300 retrieval results run for CLIR. This new retrieval results file was then sent off to list-wise reranking.
Task: Cross-Language Report Generation
Created SPLADE, BM25-dt, and PLAID baselines for task (all f-stage baselines). Ran all baselines, evaluated runs, RRFed runs into fusion runs, and then evaluated fusion runs. Fusion runs were sent off to mono stage.
Collaboratively built a Llama-3.1 baseline that uses PromptReps to do f-stage retrieval. Building this baseline involved reformatting TREC's queries/corpus/qrels to fit PromptReps' requirements, becoming familiar with PromptReps, encoding dense and sparse representations of the corpus, generating a sparse index, and searching through PromptReps.
Following the footsteps of last year's top team, I created a script from scratch that adds TOMT-KIS, a dataset of ~1.2 million ToT questions from reddit, to our given corpus (~3 million docs) and queries (150 queries). This was to create a larger corpus/query set to train distilbert on. In total, roughly 90k query-document pairs were added. To implement this corpus expansion, I learned how to use hugging face, VLLM, and difflib, and improved my prompt engineering, logic and problem solving skills.
Recreated TREC ToT's given BM25, Distilbert, and GPT4o baselines. Involved modifying and adding to ToT's scripts, fixing errors on the fly, and learning the basic concepts of each baseline.
Discussed numerous baseline ideas, did research into successful team's baselines/papers and implemented ideas, and provided coding support whenever possible.
Integrated the PromptReps repo for usage on the TOT dataset. With collaboration, adjusted the dataset to expected format, encode dense and sparse representation and built sparse index
Built on the work done by last year's top team approach, using the ideas presented in their paper and their existing code for the purposes of reranking.
This issue isn't a problem to be fixed, rather it is a record to keep track of undergraduate student's work on TREC 2024. In a series of comments, contributors can write about their work (at a high level) on specific tracks. This issue will be regularly updated upon finishing certain tasks/tracks.