cambiotraining / metagenomics

Course materials for "Introduction to Metagenomics"
https://cambiotraining.github.io/metagenomics/
Other
0 stars 1 forks source link

Software installation instructions #3

Open tavareshugo opened 8 months ago

tavareshugo commented 8 months ago

@lkalmar can you list here on this issue all the software that is used in the course? So we can keep track of things for future iterations of the course.

lkalmar commented 8 months ago

R packages: dada2, phyloseq, Biostrings, ggplot2, reshape2, readxl, tydiverse

Command line applications: fastqc, multiqc, cutadapt, trimmomatic, bowtie2, samtools, metaphlan, mash, SPAdes, clumpify.sh (part of the bbmap package), flash (for merging reads), maxbin2, checkm (database!), gtdbtk (database!), prokka, abricate (database!)

tavareshugo commented 8 months ago

Can I check:

  1. It's checkm (version 1), not checkm2?
  2. Do you have commands/links to download all those databases?
lkalmar commented 8 months ago

Can I check:

  1. It's checkm (version 1), not checkm2?
  2. Do you have commands/links to download all those databases?
  1. it is the original checkm, that is still the gold standard, but I will look into checkm2
  2. I think all of these are conda installable, but to be sure, I will collect all the installations here.
tavareshugo commented 8 months ago

For point 2. I just meant the databases, not the software itself. For example, with CheckM2 they have a command checkm2 database --download --path <output>.

I don't think CheckM (version 1) has a command, but there is this: https://data.ace.uq.edu.au/public/CheckM_databases/ Is that the correct database to download?

For the other programs, I don't know if the databases come with the software or if they need to be installed separately.

tavareshugo commented 8 months ago

gtdbtk database can be obtained with the download-db.sh command

tavareshugo commented 8 months ago

abricate just seems to have the databases as part of the installation.

@lkalmar I've updated the data and setup page, would you mind revising before I close this issue?

lkalmar commented 8 months ago

Yes, checkm has a database and you need to set the database path. Abricate comes with the database, but there is a way to update those here

lkalmar commented 8 months ago

Otherwise, the update is perfect, please close the issue

tavareshugo commented 8 months ago

Reopening as we are missing MetaPhlan database.

In this page they recommend for conda installations:

metaphlan --install --bowtie2db <database folder>

Does this look right @lkalmar?

In the future we would then need to adjust the materials to point to a --bowtie2db folder that we decide to save the database into.

lkalmar commented 8 months ago

You either install the DB in your conda / miniconda / mamba / micromamba folder with the simple command metaphlan --install (not recommended on the HPC, but on own computer or in-house server this is simpler).

Or, you define the database path with the above mentioned metaphlan --install --bowtie2db <database folder> but in that case you have to define the database path during the run.

tavareshugo commented 8 months ago

Installation scripts from metaphaln and gtdb-tk are not reliable.

tavareshugo commented 8 months ago

it's a bad idea to have everything in the same environment, due to dependency conflicts (e.g. an old version of maxbin).

Update the instructions to have each software in a separate environment.

lkalmar commented 8 months ago

Maybe not each, than we end up with a huge number of envs...