BigDataBiology / SemiBin

SemiBin: metagenomics binning with self-supervised deep learning
https://semibin.rtfd.io/
115 stars 10 forks source link

Database Download & Documentation #141

Open gbouras13 opened 1 year ago

gbouras13 commented 1 year ago

Gday @psj1997 @luispedro and other Semibin developers,

Firstly thanks for Semibin(2) - it works amazingly well, so many bins recovered compared to other binning methods :)

I want to share some feedback regarding database download and Semibin's documentation.

The HPC cluster I use at my institution blocks internet access on compute nodes. Therefore, lazily downloading the Semibin2 database did not work when I ran the below command (Semibin v1.5.1, Linux installation via bioconda).

SemiBin2 multi_easy_bin -i {input.catalogue}  -b {input.bams} -o {params.outdir} -s {params.separator} --minfasta-kbs {params.minfasta}

It was difficult for me to figure out that this was in fact the error, because a database isn't mentioned in the readme and only in the FAQs of the docs, and the error message wasn't informative (apologies I have overwritten the log file or I would quote it).

I then tried following the FAQs of the docs to download the updated GTDB database, the following does not work in MMseqs2 v13.45111 (with this known MMSeqs2 error https://github.com/soedinglab/MMseqs2/issues/561)

mmseqs databases GTDB GTDB tmp

Then, after looking at the Semibin codebase I was able to install the database manually:

wget 'https://zenodo.org/record/4751564/files/GTDB_v95.tar.gz?download=1'
mv GTDB_v95.tar.gz?download=1  GTDB_v95.tar.gz
tar -xzvf GTDB_v95.tar.gz

and went from there, specifying -r {params.db} and then semibin worked perfectly.

So perhaps either including a specific --download_database flag or script, or just documenting a manual install method would help future users like me without compute node internet access.

George

luispedro commented 1 year ago

If you are calling SemiBin2, it should not be downloading the MMSeqs DB anymore. I will check again whether we had not mistakenly kept that in.