Closed Estel-Kitsune closed 7 years ago
Hi,
what version of samtools are you using?
cheers,
dom
Hi DRL,
I used samtools 1.3 when running blobtools.
Cheers, E
interesting, ...
the thing is that you have more mapped reads as total reads, which does not make any sense.
There must be some trouble with your bam file.
Can you run samtools flagstat mybam.bam
and send me the output?
Hi,
This is the output of samtools flagstat. I guess there's something wrong with my BAM file.
51876606 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
9637592 + 0 supplementary
0 + 0 duplicates
50620400 + 0 mapped (97.58% : N/A)
42239014 + 0 paired in sequencing
21119507 + 0 read1
21119507 + 0 read2
38371134 + 0 properly paired (90.84% : N/A)
40701846 + 0 with itself and mate mapped
280962 + 0 singletons (0.67% : N/A)
2302704 + 0 with mate mapped to a different chr
2158868 + 0 with mate mapped to a different chr (mapQ>=5)
The BAM file seems to look good. Otherwise samtools would have complained. The output looks like there should be no problem for blobtools to parse this ...
Can you check whether you might have other samtools versions in your path?
echo $PATH
samtools --version
And could you also send me the command you used for generating the COV file?
Hi,
With our module system we can't have different versions of a tool at the same time.
But I tried it a second time, not using modules but a conda environment with samtools 1.3.1. My command is:
python /home/estelle/bin/blobtools/blobtools create -i Trinity_bestiso_nozero_l300.fa -b sort.Trinity_bestiso_nozero_l300.bam -t all.blastx.swissprot.blastout -t all.megablastn.nt.blastout -t Trinity_bestiso_nozero_l300.vs.silva.1e65.megablast.out -o my_second_blob --nodes /proj/b2013006/nobackup/sw/data/blobology/nodes.dmp --names /proj/b2013006/nobackup/sw/data/blobology/names.dmp
and the output is still problematic (mapping rate of 119.8%):
`[STATUS] : Parsing FASTA - Trinity_bestiso_nozero_l300.fa
[STATUS] : Creating nodesDB from /proj/b2013006/nobackup/sw/data/blobology/nodes.dmp and /proj/b2013006/nobackup/sw/data/blobology/names.dmp
[STATUS] : Parsing tax0 - /pica/v12/b2011127_nobackup/private/blobtools/all.blastx.swissprot.blastout
[STATUS] : Parsing tax1 - /pica/v12/b2011127_nobackup/private/blobtools/all.megablastn.nt.blastout
[STATUS] : Parsing tax2 - /pica/v12/b2011127_nobackup/private/blobtools/Trinity_bestiso_nozero_l300.vs.silva.1e65.megablast.out
[STATUS] : Computing taxonomy using taxrule(s) bestsum
[PROGRESS] : 100%
[STATUS] : Parsing bam0 - /pica/v12/b2011127_nobackup/private/blobtools/sort.Trinity_bestiso_nozero_l300.bam
[STATUS] : Checking with 'samtools flagstat'
[STATUS] : Mapping reads = 50,620,400, total reads = 42,239,014 (mapping rate = 119.8%)
[PROGRESS] : 100%
[STATUS] : Writing sort.Trinity_bestiso_nozero_l300.bam.cov
[STATUS] : Generating BlobDB and writing to file my_second_blob.blobDB.json `
Hi,
first two comments:
/home/estelle/bin/blobtools/blobtools create
, you should not need to write python /home/estelle/bin/blobtools/blobtools create
as blobtools uses the python version in your current environment.Blobtools has successfully finished.
So actually everything went ok-ish. The only problem, as you have spotted, is that it incorrectly inferred the mapping rate. I fixed that and if you do git pull
in the blobtools folder you should get the newest version of the master-branch. The error was that blobtools did:
reads_mapped = reads_mapped - reads_secondary - reads_secondary
instead of:
reads_mapped = reads_mapped - reads_secondary - reads_supplementary
(although this has minimal influence in the final result)
can you pull the master branch and confirm that the mapping rate is sensible now?
Yes I'm blobtool'ing a transcriptome! :)
And now the mapping info is correct:
[STATUS] : Mapping reads = 40,982,808, total reads = 42,239,014 (mapping rate = 97.0%)
Thanks a lot for your help!
Thank YOU for spotting the bug!
I actually have never done a blobplot of a transcriptome, since I work mostly on genome assemblies (but I am aware that people do this).
If you have any ideas/suggestions re best-practices for transcriptome filtering/blobtool'ing practices, please let me know and I can add them to the readme.io
cheers,
dom
Sorry to re-open this thread, but 2 things:
[STATUS] : Checking with 'samtools flagstat' [STATUS] : Mapping reads = 40,982,808, total reads = 42,239,014 (mapping rate = 97.0%) [PROGRESS] : 100% [PROGRESS] : 123% [WARN] : Based on samtools flagstat: expected 40982808 reads, 50620400 reads were parsed [STATUS] : Writing sort.Trinity_bestiso_nozero_l300.bam.cov [STATUS] : Generating BlobDB and writing to file my_second_blob.blobDB.json
The COV header:
blobtools v0.9.19
Total Reads = 42239014
Mapped Reads = 40982808
Unmapped Reads = 1256206
- The stats file and the PNG created by blobtools blobplot are still incorrect (identical to before the bug), with the "all" map percentage equal to 119.8%.
Cheers, Estelle
Hmm, ...
The warning is because blobtools expected different numbers based on the samtools flagstat
output.
Changed the code, could you pull and check?
Hi,
I got the same output after the pull.
/E
Hi,
I seem to have a problem with the coverage file generated by blobtools. This is the head of my .cov file (the alignment was done with bwa 0.7.8):
The total reads number is correct. I guess I really might have 8381386 unmapped reads , and then 42239014 - 8381386 = 33,857,628 mapped reads.
And the stat file looks like this:
Any idea about what causes this error?
Thanks, Estelle