Open jianshu93 opened 1 year ago
That's interesting. We have begun benchmarking against synthetic PacBio reads and do find that Vamb performs well, so this is surprising. It could be overfitting of Vamb's network - but then it's weird GraphMB does not overfit.
I doubt it's long contigs, since Illumina assemblies also can create long contigs, and these bin just fine.
In general, I would expect GraphMB is superior to Vamb on long-read data. It's an extension of Vamb that is tuned for Nanopore reads, and which also include the assembly graph information.
It could also be a case of the depth, ie how many samples and what is the average depth of the contigs? This could be low if the number of pacbio reads are small and mess up the abundance estimations.
How to run vamb command for PacBio long reads assembly?
I run the below command:
sample=lichen2 threads=128 minimap2 -d ${sample}_rena.contigs.mmi ${sample}_rena.contigs.fasta minimap2 -t 28 -N 5 -a --split-prefix mmsplit -t ${threads} ${sample}_rena.contigs.mmi ../${sample}.fasta.gz 2> 2-bam/${sample}_unsort.log | samtools view -F 3584 -b --threads ${threads} -o 2-bam/${sample}_unsort.bam prefix=vamb_bin/${sample} rm -rf vamb_bin mkdir -p vamb_bin vamb --outdir ${prefix} --fasta ${sample}_rena.contigs.fasta --bamfiles 2-bam/${sample}_unsort.bam -o C --minfasta 200000
But there is an error message:
[E::idx_find_and_load] Could not retrieve index file for '2-bam/lichen2_unsort.bam'
Traceback (most recent call last):
File "/public/home/acq7wsloil/anaconda3/envs/busco/bin/vamb", line 11, in
Could you help me fix it, or share your command with me please?
Hello Vamb team,
With the same parameters for short reads binning, I used it also for a PacBio long reads sequencing project. It turns out that nearly each long contig was considered a bin (several thousand) by vamb while for GraphMB, which takes into account assembly graph, only about 100 bins are generated, which is consistent with Concoct+MaxBin2+Metabat2+DAS_tools (also about 100). I am wondering what is the problem with much longer contigs.
Thanks,
Jianshu