franciscozorrilla / metaGEM

:gem: An easy-to-use workflow for generating context specific genome-scale metabolic models and predicting metabolic interactions within microbial communities directly from metagenomic data
https://franciscozorrilla.github.io/metaGEM/
MIT License
189 stars 41 forks source link

Add rules for Diamond (blastp) and Prodigal #89

Closed zoey-rw closed 2 years ago

zoey-rw commented 2 years ago

Four new rules proposed, with no new software dependencies.

  1. run_prodigal - predicts ORFs on contigs rather than assembled genomes (perhaps could be renamed to contig_prodigal to avoid confusion)
  2. run_blastp - takes contig ORFs as input, uses the Diamond Blastp Snakemake wrapper: https://github.com/snakemake/snakemake-wrappers/tree/0.80.1/bio/diamond/blastp
  3. binning - calls the outputs from concoct, metabat, and maxbin
  4. binEvaluation - calls the outputs from binRefine and binReassembly

The parsing rules for binning and binEvaluation may be iffy - I tested on a login node, rather than as a cluster job.

franciscozorrilla commented 2 years ago

Thanks @zoey-rw, looks awesome! I will do some small tests and then merge into the main branch 💎

paristzou commented 2 years ago

After reading the article entitled "The National Ecological Observatory Network’s soil metagenomes: assembly and basic analysis [version 2; peer review: 2 approved with reservations]" (https://f1000research.com/articles/10-299/v2), I hope it can be added soon to benefit more. Thank you so much.

franciscozorrilla commented 2 years ago

Hi @paristzou, thanks for your interest in the development of metaGEM. Just FYI, these changes have now been merged into the master branch, so you should be able to see the new rules in the Snakefile.

paristzou commented 2 years ago

That's great. It makes life much easier.

paristzou commented 2 years ago

Is it possible to find a link to have the fastq.gz files for tutorial in https://github.com/franciscozorrilla/unseenbio_metaGEM , please ? Thank you.

franciscozorrilla commented 2 years ago

Hi @paristzou, unfortunately that metagenomic data is personal/private and I do not have consent to share it, as it was generated using the unseen bio metagenome sequencing service. Just to clarify, these particular samples were NOT used in the metaGEM publication/data analysis.

You can find more info about the 5 datasets we used in the metaGEM publication here.

Additionally, you may find useful the toy dataset that we generated by subsampling 3 paired end reads from the gut microbiome dataset.

paristzou commented 2 years ago

conda env list #

base * /newdisk2/anaconda3 snakemake /newdisk2/anaconda3/envs/snakemake /newdisk2/metaGEM/envs/metagem /newdisk2/metaGEM/envs/metawrap /newdisk2/metaGEM/envs/prokkaroary

source activate metagem Could not find conda environment: metagem You can list all discoverable environments with conda info --envs.

seems that metagem environment cannot be employed

franciscozorrilla commented 2 years ago

I believe that providing the full path to the env directory should do the trick, e.g.

source activate /newdisk2/metaGEM/envs/metagem

See also the conda documentation for a good reference on handling environments and such. Since this is unrelated to the PR topic, kindly open a new issue if you have further trouble with setup.