Reproducibility Study of “InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval”
This project focuses on a reproducibility study of the InPars Toolkit, a tool designed for generating synthetic data to improve neural information retrieval (IR) systems. Our objective is to replicate and validate the methodology presented in the paper while improving on the future work proposed by the authors.
git clone https://github.com/danilotpnta/IR2-project
cd IR2-project
conda env create -f environment.yml
conda activate IR2-env
Follow Step 1 from the Installation section.
WORK_DIR=$HOME/IR2-project
cd $WORK_DIR
source scripts/snellius_setup.sh
setup $PWD
pip install --upgrade pip
# Install from cache (faster)
pip install -r requirements.txt
When installing in Snellius you may want to isntall the packages using the --no-cache-dir
flag. This will prevent the installation from using the cache and may solve some issues.
pip install --no-cache-dir -r requirements.txt
You could also use the pip package manager to install InPars toolkit.
pip install inpars
This project is licensed under the MIT License. See the LICENSE file for more details.