To enable local reproduction of the benchmark, access to the exact reference proteomes listed in dataset_tags.tsv is necessary.
Please consider the following options to provide these files:
Git Large File Storage (Git LFS): Use Git LFS to upload the reference proteomes to the repository.
External Download Option: Alternatively, provide an external download link for these proteomes, along with instructions to place them in PROTEOMES_DIR.
Additionally, please provide a script that automates the retrieval of reference proteomes. The script should:
Accept an organism name or identifier as an argument.
Download the corresponding reference proteome.
Rename the file to follow the format <organism>_<database>_proteome_<date>.fasta, aligning with the naming conventions in dataset_tags.tsv.
Save the file in PROTEOMES_DIR.
This would allow users to:
Run the benchmark locally on private datasets in a consistent manner.
Periodically update the reference proteomes and rerun the benchmark.
To enable local reproduction of the benchmark, access to the exact reference proteomes listed in
dataset_tags.tsv
is necessary.Please consider the following options to provide these files:
PROTEOMES_DIR
.Additionally, please provide a script that automates the retrieval of reference proteomes. The script should:
<organism>_<database>_proteome_<date>.fasta
, aligning with the naming conventions indataset_tags.tsv
.PROTEOMES_DIR
.This would allow users to: