bhattlab / bhattlab_workflows

Computational workflows for metagenomics tasks, by the Bhatt lab
http://www.bhattlab.com
46 stars 15 forks source link

binning workflow optimizations #42

Closed tamburinif closed 3 years ago

tamburinif commented 3 years ago

Hi all! I ran the bin_das_tool_manysamp.snakefile workflow recently and noticed a few things that might be helpful to update:

  1. I had to update the prokka container to docker://quay.io/biocontainers/prokka:1.14.6--pl5262hdfd78af_1 due to a tbl2asn error.
  2. In rule concoct_extract_bins, extract_fasta_bins.py apparently expects the output directory to already exist, and snakemake doesn't create it, presumably since the directory itself is the output of the rule? Regardless, I had to amend my copy of the snakefile to create the directory first and that solved the errors I was getting initially. There might be an even better workaround here.
  3. Not an error, but I'd recommend making the initial time for the DAStool rule longer (12 or 24h perhaps). I know that it's written so that snakemake will increase runtime on the next attempt, but the majority of my jobs didn't finish within 6 hours so it seems helpful to set it higher right off the bat.
  4. Rule bin_idxstats failed for most of my "unbinned" files due to runtime, I ended up creating dummy files for unbinned contigs to get around this, but an ideal optimization would be to not run this rule on unbinned since that info might not be meaningful/helpful anyways.

This workflow is so useful and awesome, a huge thanks to everyone who worked so hard to build and maintain it! 😄

bsiranosian commented 3 years ago

Thanks Fiona, these are all fixed as of https://github.com/bhattlab/bhattlab_workflows/commit/bbe6511c7e03f4a247f047cc4cdaf62a9250d6ae