Closed B-1991-ing closed 2 years ago
The first 20000 lines of 44.bam and the first first 5000 lines of assembly.fa. work_files.zip
Thanks! The problem is indeed, that the bam file contains only the contigs ids, e.g.:
V350044016L3C001R0030160448 163 contig-100_0 1 60 6S144M = 87 236 TGGGAGGTGAGAGCTATCGTTTGAAGCAGAAAACCGGCGGTTAGCGGTGAGCAAGTGGGTCAATTCTATTGGCCAAAAGTGGGTCAAAATTAATGGCCATTGACATTGGTGAACTCGCAATCAAGAAAGAGCCTCATTTTGCTGGAACTC FGGFFFFFGFGFGFGGFGBGGFGFFAGGGFGGGGFFGFA@=GFFFFFFEEEGGGGFEFFEGGFFFFEEGGFGGFEGGFG@GGGFFFBGGGFGFFEGFAFFGGEFEGGBFFGBDFFEGCFEGEEFEFFFDFGGFBEGFFFGGFBEF9CFEG NM:i:0 MD:Z:144 MC:Z:150M AS:i:144 XS:i:105
whereas the assembly has the added, space-separated info. I am unsure why MetaWrap does that and how it deals with this discrepancy. Did you check, if the other external binners dealt with that just fine and where able to produce bins of sufficient quality?
I corrected it by just running sed -e "s/ .*//g" assembly_5000.fa > assembly_5000_ids_only.fa
Then it also found bins, even with the truncated files.
-rw-rw-r-- 1 oskar.hickl oskar.hickl 7348160 Jun 30 13:51 binny_I01R01.000000_C97_P97_Pseudomonadaceae.fasta
-rw-rw-r-- 1 oskar.hickl oskar.hickl 3246322 Jun 30 13:51 binny_I01R01.000003_C93_P92_Spirochaetes.fasta
-rw-rw-r-- 1 oskar.hickl oskar.hickl 684781 Jun 30 13:51 binny_I02R01.000030_C80_P100_Bacteria.fasta
I did use the default 90 purity threshold and 70 completeness though, which I recommend you to do also. binny is decent at assessing bin quality, but not as good as e.g. CheckM. I would keep the purity at 90 and then use CheckM (and other MAG quality control tools or refiners as e.g. MetaWrap refine, which runs CheckM anyway) to do additional filtering. binny might under/overestimate completeness and purity by a few percent and thus sometimes discard a good bin, if you set the purity threshold too high.
Did you check, if the other external binners dealt with that just fine and where able to produce bins of sufficient quality?
Only Semibin used the IDBAUD contig.fa file and metawrap_binning generated bam files. For sample 44, I got 1 HQ bin and 2MQ bins from Semibin, using command line below. HQ bin: cat output_bins.stats|awk '$2>=90 && $3<5' |wc -l MQ bin: cat output_bins.stats|awk '90>$2 && $2>=50 && $3<5' |wc -l
I thought it would be fine to directly use the metawrap_binning generated bam file, so I just copied and used it as the input of semibin.
I dont know, how SemiBin calculates the coverage internally. Might be worth to ask the developers, if input in this form is a problem.
I dont know, how SemiBin calculates the coverage internally. Might be worth to ask the developers, if input in this form is a problem.
Yeah, I will. I will also generate the bam file myself instead of directly use the metawrap_binning generated bam file as the input of sembin and binny, to see will I get different better quality bins than now.
Thank you very much for your help.
Dear binny support team,
I used binny, but error happened as shown below. Could you help me check what is the reason?
Error screenshot
Error file binny_25.err-25.txt
Log file binny_25.log-25.txt
Job script binny_25.sh.txt
Best,
Bing