Closed ChaoXianSen closed 11 months ago
Have you checked the DNA/RNA sequence number 160128 in your FASTA file, as indicated by the error message? Does its content look normal?
the bam file number 160128 :
its content look normal.
Which version of Vamb are you running?
Also, the BAM record you posted contains the 160128th read, whereas the error pertains to the 160128th contig. Can you check that in the FASTA file?
Ah, that file contains contigs shorter than 2 kbp, which are filtered away. Is there an output file called "contignames"? In that, find the 160128th contig name, and use the name of that contig to find the sequence. Sorry for the hassle!
pipeline : vamb --outdir ${fname} \ --fasta ${fname}.contigs_headNoSpace.fa --bamfiles ${fname}_sort_changed_header.bam
'${fname}.contigs_headNoSpace.fa' is the raw contig (assembly for megahit, contains sequecnes shorter than 2 kbp),
I jsut want to use vamb to get the file 'cluster.tsv' to further analysis ( vamb -> PHAMB);
Sample F183 , program runs with an error , not produced the file 'contignames'.
But, another sample F1, the file 'contignames' like this :It doesn't seem to delete contigs shorter than 2 kbp:
Okay. I still need to see that contig 160128 looks good.
Can you run the following code in the directory which failed, such that it has access to the file composition.npz
?
import vamb
comp = vamb.parsecontigs.Composition.load("composition.npz")
N = 160128
print(comp.metadata.lengths[N])
print(comp.metadata.identifiers[N])
print(comp.matrix[N])
That should print the contig name (and some other info I'm interested in). Then, given the contig name XXX, you can do grep "^>XXX" -A 2 my_contigs.fasta
its content look normal.
Okay, I found the bug! This is indeed a bug in Vamb and has nothing to do with your particular sequence. It just so happens that the vector you print has a sum that is exactly zero, and this causes a bug in Vamb. I'll push a fix ASAP.
through the log file, It doesn't seem to Creating and training VAE~
Okay, I found the bug! This is indeed a bug in Vamb and has nothing to do with your particular sequence. It just so happens that the vector you print has a sum that is exactly zero, and this causes a bug in Vamb. I'll push a fix ASAP.
so that's how matters stand, thanks for your reply. Thank you very much for your help !
I have another question , the running speed of avamb seems to slowly, What can I do to speed process up ? pipeline : vamb --outdir ${fname} --fasta ${fname}.contigs_headNoSpace.fa --bamfiles ${fname}_sort_changed_header.bam
Your best bet would be to use a GPU, and set --cuda
when running. This will speed up training and clustering quite a bit.
What step in particular is slow? You can check the log file.
Okay, I found the bug! This is indeed a bug in Vamb and has nothing to do with your particular sequence. It just so happens that the vector you print has a sum that is exactly zero, and this causes a bug in Vamb. I'll push a fix ASAP.
the vector is this comp.matrix[N] ? sum(comp.matrix[N]) does not seem to be equal to zero ?
Your best bet would be to use a GPU, and set
--cuda
when running. This will speed up training and clustering quite a bit. What step in particular is slow? You can check the log file.
the process of Creating and training VAE is slow, which can I improve ? look forward for your reply again !
sum(comp.matrix[N]) does not seem to be equal to zero ?
Hmm... it's possible that this is because it's computed slightly differently in Vamb, so there might be some rounding error where it may return either 3.79e-9 or 0.0, depending on the exact order of the floating point operations.
sum(comp.matrix[N]) does not seem to be equal to zero ?
Hmm... it's possible that this is because it's computed slightly differently in Vamb, so there might be some rounding error where it may return either 3.79e-9 or 0.0, depending on the exact order of the floating point operations.
OKOK ,i get it. thanks a lot !
Dear @sgalkina pipeline : $vamb --outdir ${outdir} \ --fasta ${fname}.contigs_headNoSpace.fa \ --bamfiles ${fname}_sort_changed_header.bam
The error is as follows:
Traceback (most recent call last): File "/public/home/bioinfo_wang/00_software/miniconda3/envs/avamb/bin/vamb", line 33, in
sys.exit(load_entry_point('vamb', 'console_scripts', 'vamb')())
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 1387, in main
run(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/main.py", line 768, in run
data_loader = vamb.encode.make_dataloader(
File "/public/home/bioinfo_wang/00_software/vamb/vamb/encode.py", line 113, in make_dataloader
raise ValueError(
ValueError: TNF row at index 160128 is all zeros. This implies that the sequence contained no 4-mers of A, C, G, T or U, making this sequence uninformative.This is probably a mistake. Verify that the sequence contains usable information (e.g. is not all N's)
All of the other samples that I've run are going to work, but the only two samples that I've run are going to go wrong, What went wrong ?
Looking forward to your reply !