RasmussenLab / vamb

Variational autoencoder for metagenomic binning
MIT License
259 stars 46 forks source link

vamb operating problem? #221

Closed ChaoXianSen closed 1 year ago

ChaoXianSen commented 1 year ago

rum mode: vamb --outdir output63 \ --fasta R63.contigs.fa.gz \ --bamfiles R63_sort.bam \ -o C

report err.log : Traceback (most recent call last): File "/public/home/bioinfo_wang/00_software/miniconda3/envs/avamb/bin/vamb", line 33, in sys.exit(load_entry_point('vamb', 'console_scripts', 'vamb')()) File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 1395, in main run( File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 834, in run cluster( File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 665, in cluster clusternumber, ncontigs = vamb.vambtools.write_clusters( File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 440, in write_clusters for clustername, contigs in clusters: File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 701, in binsplit for newbinname, splitheaders in _split_bin(binname, headers, separator): File "/public/home/bioinfo_wang/00_software/vamb/vamb/vambtools.py", line 676, in _split_bin raise KeyError(f"Separator '{separator}' not in sequence label: '{header}'") KeyError: "Separator 'C' not in sequence label: 'k141_84347'"

But, the reuslt contain ‘k141_84347 ’ :
‘ less contignames |grep "k141_84347" -A2 -B2 ' --> 'k141_512747 k141_170723 k141_84347 k141_170724 k141_512748'

the vamb operation result file contain : '0 Oct 9 23:52 vae_clusters.tsv # why the file is empty? 7.7M Oct 9 23:52 contignames 2.6M Oct 9 23:52 lengths.npz 41K Oct 9 23:52 log.txt 77M Oct 9 23:52 latent.npz 815K Oct 9 23:51 model.pt 894 Oct 9 14:40 mask.npz 2.3M Oct 9 14:40 abundance.npz 252M Oct 9 14:38 composition.npz'

jakobnissen commented 1 year ago

Dear @ChaoXianSen

As stated in the recommended workflow on the README page, if you set the binsplit separator to "C" (passing -o C on command line), then your contig names must be of the format "{sample}C{contig}", with a C separating the two. In case you're running with a single sample, do not use -o C. If you have multiple samples, please follow the recommended workflow and rename your contigs according to the scheme.

Closing this issue as solved, but you're welcome to post here for more questions.

ChaoXianSen commented 1 year ago

the file R63.contigs.fa.gz format like this:

k141_514373 flag=1 multi=3.0000 len=356 CCATAAATCTGATTTTAGTCAAAAAAATATGCAGTTTTTCAAAAAGGGTGTATAATTCTTTCGTTACATGAAATATTTTGGAGGTGCTATTTTTATGAAAA k141_0 flag=1 multi=2.0000 len=345 AGGTGAAGATGACCGAAGAGGAGATTAAGGCCCGTGAGTTTGCCAAAGCGGCGCAGAAGGAGAAGGAGGACCGTGAGGCCAAGAAAGCGCTGG

I do not use -o C, I'll try again . Thanks a lot.