Open tavareshugo opened 8 months ago
R packages: dada2, phyloseq, Biostrings, ggplot2, reshape2, readxl, tydiverse
Command line applications: fastqc, multiqc, cutadapt, trimmomatic, bowtie2, samtools, metaphlan, mash, SPAdes, clumpify.sh (part of the bbmap package), flash (for merging reads), maxbin2, checkm (database!), gtdbtk (database!), prokka, abricate (database!)
Can I check:
checkm
(version 1), not checkm2
?Can I check:
- It's
checkm
(version 1), notcheckm2
?- Do you have commands/links to download all those databases?
For point 2. I just meant the databases, not the software itself.
For example, with CheckM2 they have a command checkm2 database --download --path <output>
.
I don't think CheckM (version 1) has a command, but there is this: https://data.ace.uq.edu.au/public/CheckM_databases/ Is that the correct database to download?
For the other programs, I don't know if the databases come with the software or if they need to be installed separately.
gtdbtk
database can be obtained with the download-db.sh
command
abricate
just seems to have the databases as part of the installation.
@lkalmar I've updated the data and setup page, would you mind revising before I close this issue?
Yes, checkm has a database and you need to set the database path. Abricate comes with the database, but there is a way to update those here
Otherwise, the update is perfect, please close the issue
Reopening as we are missing MetaPhlan database.
In this page they recommend for conda installations:
metaphlan --install --bowtie2db <database folder>
Does this look right @lkalmar?
In the future we would then need to adjust the materials to point to a --bowtie2db
folder that we decide to save the database into.
You either install the DB in your conda / miniconda / mamba / micromamba folder with the simple command metaphlan --install
(not recommended on the HPC, but on own computer or in-house server this is simpler).
Or, you define the database path with the above mentioned metaphlan --install --bowtie2db <database folder>
but in that case you have to define the database path during the run.
Installation scripts from metaphaln and gtdb-tk are not reliable.
it's a bad idea to have everything in the same environment, due to dependency conflicts (e.g. an old version of maxbin
).
Update the instructions to have each software in a separate environment.
Maybe not each, than we end up with a huge number of envs...
@lkalmar can you list here on this issue all the software that is used in the course? So we can keep track of things for future iterations of the course.