MattiaPandolfoVR / MetaPhage

GNU General Public License v3.0
36 stars 9 forks source link

Krona, VIBRANT and QUAST databases #40

Open lxsteiner opened 2 years ago

lxsteiner commented 2 years ago

Thanks for the pipeline, curious to try it out!

After installing dependencies in a conda environment - Krona, VIBRANT and QUAST databases normally need to be set up and/or updated:

Krona installed.  You still need to manually update the taxonomy
databases before Krona can generate taxonomic reports.  The update
script is ktUpdateTaxonomy.sh.  The default location for storing
taxonomic databases is /.../.conda/envs/MetaPhage/opt/krona/taxonomy

If you would like the taxonomic data stored elsewhere, simply replace
this directory with a symlink.  For example:

rm -rf /.../.conda/envs/MetaPhage/opt/krona/taxonomy
mkdir /path/on/big/disk/taxonomy
ln -s /path/on/big/disk/taxonomy /.../.conda/envs/MetaPhage/opt/krona/taxonomy
ktUpdateTaxonomy.sh

                                                                                                                                                       |
Please run download-db.sh to download all required VIBRANT database files to /.../.conda/envs/MetaPhage/share/vibrant-1.2.0/databases/

                                                                                                                                                       | The default QUAST package does not include:
* GRIDSS (needed for structural variants detection)
* SILVA 16S rRNA database (needed for reference genome detection in metagenomic datasets)
* BUSCO tools and databases (needed for searching BUSCO genes) -- works in Linux only!

To be able to use those, please run
    quast-download-gridss
    quast-download-silva
    quast-download-busco

Is this covered by database download section or should this also be done in addition before downloading the databases with the python script?

Thanks

telatin commented 2 years ago

Hi! databases used by the pipeline, and this includes Vibrant, have to be downloaded as described here: https://mattiapandolfovr.github.io/MetaPhage/notes/databases.html This will ensure the correct structure of the directory.

The post-installation scripts you mention can be run, but I would suggest using Singularity or Docker as the first choice before using Conda (same reasons as Nf-core pipelines as described here https://nf-co.re/eager/2.4.4/usage#profile)

Feel free to use ktUpdateTaxonomy.sh, and Quast scripts if you manually set up the conda environment though

lxsteiner commented 2 years ago

Thanks for the feedback.

I would suggest using Singularity or Docker as the first choice before using Conda

Unfortunately sometimes easier said than done with certain setups ;)

On another note, I downloaded the databases with the script ./bin/python/db_manager.py -o ./db -m 6 and downloaded the example dataset ./bin/getExample.py --verbose -t 8 and when I try to generate the configuration file, in the activated conda environment, with

python ./bin/newProject.py -i demo \
    -m demo/infant-metadata.csv \
    -v Infant_delivery_type \
    -s demo.conf

I get this message

WARNING: Database directory will be created: /.../MetaPhage/db/diamond
INFO: this environment is not ready to run MetaPhage. Remember to use a container or activate the environment.
INFO: Found 10 samples in /.../MetaPhage/demo
INFO: Saving configuration to demo.conf

Is the DIAMOND database not being generated by db_manager.py normal behaviour or what is the warning for? Why does it say my environment is not ready? I did all this within an activated environment so far.

Thanks.

telatin commented 2 years ago

Diamond should be generated during your first MetaPhage run, and you can ignore the warning on the environment: will be fixed for the next release :) Thanks for the feedback!