TREC 2024 - Workflow Record

Yuv-sue1005 commented 3 months ago

This issue isn't a problem to be fixed, rather it is a record to keep track of undergraduate student's work on TREC 2024. In a series of comments, contributors can write about their work (at a high level) on specific tracks. This issue will be regularly updated upon finishing certain tasks/tracks.

Yuv-sue1005 commented 3 months ago

Here's my contributions to TREC's NeuCLIR track.

Task: Cross-Language Technical Documents
- Assisted in creation of a f-stage, gte-qwen2 dense retrieval baseline. Assisted in writing scripts for encoding corpus and queries, and then running retrieval with aforementioned embeddings.
- Created BM25 document translation and query translation baselines. Indexed corpus, ran baselines, and evaluated all created runs.
- Created a SPLADE document translated baseline. Encoded the corpus and queries using splade model, indexed corpus embeddings, ran the baseline, and evaluated results. This baseline was not used in submissions, given it's surprisingly low eval scores.
- Created and ran several data munging scripts to reformat queries and corpus.
Task: Multilingual Retrieval (MLIR)
- Created and used multiple reformatting scripts, the most notable one being converting a trec run into retrieval results format.
- Created SPLADE, BM25-dt, BM25-qt, and PLAID baselines for task (basically all f-stage baselines). Ran all baselines, evaluated runs, RRFed runs into fusion runs, and then evaluated fusion runs. Fusion runs were sent off to mono stage.
- Fused (with RRF) and evaluated post-mono and post-listo runs.
Task: Cross-language Retrieval (CLIR)
- Taking post-mono fused runs from MLIR, I combined the zho, rus, and fas runs to create a top 300 retrieval results run for CLIR. This new retrieval results file was then sent off to list-wise reranking.
Task: Cross-Language Report Generation
- Created SPLADE, BM25-dt, and PLAID baselines for task (all f-stage baselines). Ran all baselines, evaluated runs, RRFed runs into fusion runs, and then evaluated fusion runs. Fusion runs were sent off to mono stage.
- Created and used multiple reformatting scripts.

Stefan824 commented 3 months ago

Cross-Language Technical Document Tasks

1. Data Preprocessing

Preprocessed all relevant data to a Pyserini-compatible format (corpus, topics/queries, and qrels).

2. Dense Embedding Experiments Reproduction

Reproduced existing experiments using Pyserini for pre-encoded corpus and queries.
Gained insights into how Pyserini handles external models.

3. Dense Retrieval Baseline (GTE-Qwen2)

Developed and implemented the dense retrieval baseline using the GTE-Qwen2 model (a dense embedding model not natively supported by Pyserini).
Scripted the encoding process for both corpus and queries, converting them into a Pyserini-compatible format.
Executed indexing, searching, and evaluation using Pyserini.

4. Integration for Pyserini with the Qwen Model

Added connector code to integrate the GTE-Qwen2 dense embedding model with Pyserini.
Currently testing model performance; results are pending.

5. BM25 and SPLADE Baselines (Document and Query Translation)

Assisted with setting up baselines using supported models from Pyserini/Anserini.

6. Runs Fusion

Scripted processes for fusing outputs from different baselines.
Finalized outstanding runs with run fusion strategies.

Multi-Language Information Retrieval (MLIR) and Cross-Language Information Retrieval (CLIR) Tasks

1. Baseline Setup and Reproduction

Located and reproduced baselines from previous years.
Contributed to the setup of all first-stage retrieval baselines.

Report Generation Tasks

1. Report Request Handling

Scripted tools to extract and format key information from report requests.

2. Prompt Engineering for GPT-4

Engineered prompts for GPT-4 to break down report requests into sub-questions.
Implemented scripts to generate results from these sub-questions.

3. First-Stage Retrieval for Reports

Performed initial retrieval on report requests paired with corresponding sub-questions.

4. Reranking with Cohere Reranker

Applied reranking using the Cohere reranker on the initial retrieval results.

Yuv-sue1005 commented 3 months ago

Here's my contributions to the TREC RAG track.

Implemented bge-reranker-v2 models to rank_llm as a pointwise reranker (like monoT5) for use in RAG. Implementation involved understanding, borrowing, modifying, and adding to pre-existing rank_llm code to work with the bge-reranker-v2 line of models. Now, anyone can use implemented bge models for pointwise reranking. The exact models implemented are BAAI/bge-reranker-base, BAAI/bge-reranker-large, BAAI/bge-reranker-v2-m3, BAAI/bge-reranker-v2-gemma, and BAAI/bge-reranker-v2-minicpm-layerwise.

Yuv-sue1005 commented 1 month ago

My contributions to TREC ToT.

Collaboratively built a Llama-3.1 baseline that uses PromptReps to do f-stage retrieval. Building this baseline involved reformatting TREC's queries/corpus/qrels to fit PromptReps' requirements, becoming familiar with PromptReps, encoding dense and sparse representations of the corpus, generating a sparse index, and searching through PromptReps.
Following the footsteps of last year's top team, I created a script from scratch that adds TOMT-KIS, a dataset of ~1.2 million ToT questions from reddit, to our given corpus (~3 million docs) and queries (150 queries). This was to create a larger corpus/query set to train distilbert on. In total, roughly 90k query-document pairs were added. To implement this corpus expansion, I learned how to use hugging face, VLLM, and difflib, and improved my prompt engineering, logic and problem solving skills.
Recreated TREC ToT's given BM25, Distilbert, and GPT4o baselines. Involved modifying and adding to ToT's scripts, fixing errors on the fly, and learning the basic concepts of each baseline.
Discussed numerous baseline ideas, did research into successful team's baselines/papers and implemented ideas, and provided coding support whenever possible.

natek-1 commented 3 weeks ago

Trec TOT contribution

Integrated the PromptReps repo for usage on the TOT dataset. With collaboration, adjusted the dataset to expected format, encode dense and sparse representation and built sparse index
Built on the work done by last year's top team approach, using the ideas presented in their paper and their existing code for the purposes of reranking.

Stefan824 commented 2 weeks ago

Trec TOT contribution:

Literature review on successful teams from past years and reproducing results
Identified corpus-expansion strategy and helped with script-writing
Worked on various scripts like add_tomt_kis_vllm.py

castorini / ura-projects