edgraham / BinSanity

Unsupervised Clustering of Environmental Microbial Assemblies Using Coverage and Affinity Propagation
GNU General Public License v3.0
29 stars 14 forks source link

Error during Affinity Propagation stage with Binsanity-lc #53

Open michaelwoodworth opened 3 years ago

michaelwoodworth commented 3 years ago

Hi, interested in using BinSanity, thanks for your hard work in developing/maintaining it.

I'm using Binsanity v0.4.1, installed with conda.

When running my metagenomes on cluster nodes with 400gb of RAM, they failed so I'm trying to run Binsanity-lc. I used these parameters, again with 20 processors and 400gb of RAM:

Binsanity-lc -f ${indir} -l ${ID}_scaffolds.fasta -c ${pDIR}/profiles/${ID}.cov.cov.x100.lognorm -o ${pDIR}/${ID}-BinsanityWF -x 3000 --checkm_threads 20 --kmean_threads 20 --Prefix ${ID}

It looks like the kmeans step completes but in the Affinity Propagation stage it fails:

            ____________________________________________________

             Clustering Bin  SD01cat-kmean-bin_71.fna
             via Affinity Propagation
            ____________________________________________________
          Preference: -3
          Maximum Iterations: 4000
          Convergence Iterations: 400
          Contig Cut-Off: 3000
          Damping Factor: 0.95
          Coverage File: /storage/home/hcoda1/0/mwoodworth8/scratch/PREMIX/21.01.01_all_metagenomes/06.h.binsanity_pt_cat/profiles/SD01cat.cov.cov.x100.lognorm
          Fasta File: SD01cat-kmean-bin_71.fna
          Output Directory: /storage/home/hcoda1/0/mwoodworth8/scratch/PREMIX/21.01.01_all_metagenomes/06.h.binsanity_pt_cat/SD01cat-BinsanityWF
          (47, 1)
Traceback (most recent call last):
  File "/storage/home/hcoda1/0/mwoodworth8/.conda/envs/binsanity/bin/Binsanity-lc", line 516, in <module>
    print("The program failed to complete clustering with affinity propagation when it reached %s. Check the number of contigs in the following bin: %s. If the number is >100,000 it is likely you ran into a memory error.") % (clust)
TypeError: not enough arguments for format string

I've attached the log file below. Are there thoughts on what could be going on here?

thanks!

SD01cat-BinsanityLC-log.txt

edgraham commented 3 years ago

Hello,

Its an interesting failure because based on what you log-file says its a small set of contigs its trying to cluster at the step its failing at. Historically when I have encountered failures at this stage it is due to memory issues. But one other issue that could be causing this that I have run into once before is related to contig ids. Specifically errors can arise when ids aren't simplified deflines or if contig ids start with a number.

So try normalizing your contig ids to read similar to:

>contig_1
AGCTAGCTAGTAGCTAGCTA
>contig_2
AGCTAGCTAGCTGATGCTAG

-Elaina