a-h-b / binny

GNU General Public License v3.0
28 stars 6 forks source link

ModuleNotFoundError: No module named 'conda._vendor.auxlib' #28

Closed B-1991-ing closed 2 years ago

B-1991-ing commented 2 years ago

Dear binny support team,

I used binny, but error happened as shown below. Could you help me check what is the reason?

Error screenshot

Screenshot 2022-06-20 at 18 40 10

Error file binny_25.err-25.txt

Log file binny_25.log-25.txt

Job script binny_25.sh.txt

Best,

Bing

B-1991-ing commented 2 years ago

The first 20000 lines of 44.bam and the first first 5000 lines of assembly.fa. work_files.zip

ohickl commented 2 years ago

Thanks! The problem is indeed, that the bam file contains only the contigs ids, e.g.:

V350044016L3C001R0030160448     163     contig-100_0    1       60      6S144M  =       87      236     TGGGAGGTGAGAGCTATCGTTTGAAGCAGAAAACCGGCGGTTAGCGGTGAGCAAGTGGGTCAATTCTATTGGCCAAAAGTGGGTCAAAATTAATGGCCATTGACATTGGTGAACTCGCAATCAAGAAAGAGCCTCATTTTGCTGGAACTC  FGGFFFFFGFGFGFGGFGBGGFGFFAGGGFGGGGFFGFA@=GFFFFFFEEEGGGGFEFFEGGFFFFEEGGFGGFEGGFG@GGGFFFBGGGFGFFEGFAFFGGEFEGGBFFGBDFFEGCFEGEEFEFFFDFGGFBEGFFFGGFBEF9CFEG  NM:i:0  MD:Z:144        MC:Z:150M       AS:i:144        XS:i:105

whereas the assembly has the added, space-separated info. I am unsure why MetaWrap does that and how it deals with this discrepancy. Did you check, if the other external binners dealt with that just fine and where able to produce bins of sufficient quality? I corrected it by just running sed -e "s/ .*//g" assembly_5000.fa > assembly_5000_ids_only.fa Then it also found bins, even with the truncated files.

-rw-rw-r-- 1 oskar.hickl oskar.hickl 7348160 Jun 30 13:51 binny_I01R01.000000_C97_P97_Pseudomonadaceae.fasta
-rw-rw-r-- 1 oskar.hickl oskar.hickl 3246322 Jun 30 13:51 binny_I01R01.000003_C93_P92_Spirochaetes.fasta
-rw-rw-r-- 1 oskar.hickl oskar.hickl  684781 Jun 30 13:51 binny_I02R01.000030_C80_P100_Bacteria.fasta

I did use the default 90 purity threshold and 70 completeness though, which I recommend you to do also. binny is decent at assessing bin quality, but not as good as e.g. CheckM. I would keep the purity at 90 and then use CheckM (and other MAG quality control tools or refiners as e.g. MetaWrap refine, which runs CheckM anyway) to do additional filtering. binny might under/overestimate completeness and purity by a few percent and thus sometimes discard a good bin, if you set the purity threshold too high.

B-1991-ing commented 2 years ago

Did you check, if the other external binners dealt with that just fine and where able to produce bins of sufficient quality?

Only Semibin used the IDBAUD contig.fa file and metawrap_binning generated bam files. For sample 44, I got 1 HQ bin and 2MQ bins from Semibin, using command line below. HQ bin: cat output_bins.stats|awk '$2>=90 && $3<5' |wc -l MQ bin: cat output_bins.stats|awk '90>$2 && $2>=50 && $3<5' |wc -l

I thought it would be fine to directly use the metawrap_binning generated bam file, so I just copied and used it as the input of semibin.

Screenshot 2022-06-30 at 14 24 05
ohickl commented 2 years ago

I dont know, how SemiBin calculates the coverage internally. Might be worth to ask the developers, if input in this form is a problem.

B-1991-ing commented 2 years ago

I dont know, how SemiBin calculates the coverage internally. Might be worth to ask the developers, if input in this form is a problem.

Yeah, I will. I will also generate the bam file myself instead of directly use the metawrap_binning generated bam file as the input of sembin and binny, to see will I get different better quality bins than now.

Thank you very much for your help.