DRL / blobtools

Modular command-line solution for visualisation, quality control and taxonomic partitioning of genome datasets
GNU General Public License v3.0
187 stars 44 forks source link

./blobtools create not working properly #37

Closed tarunaaggarwal closed 7 years ago

tarunaaggarwal commented 7 years ago

Hello,

I'm going through the tutorial on your site, but am running into issues. Below is the command that I'm trying and the errors that I'm getting. I'm using the new version 0.9.19.4.

Command from tutorial:

./blobtools create   -i test_files/assembly.fna   -b test_files/mapping_1.bam   -t test_files/blast.out   -o test_files/my_first_blobplot

Errors:

[STATUS]        : Parsing FASTA - test_files/assembly.fna
[STATUS]        : names.dmp/nodes.dmp not specified. Retrieving nodesDB from /home/mcclintock/ta2007/bin/blobtools/data/nodesDB.txt
[PROGRESS]      :       100%
[STATUS]        : Parsing tax0 - /home/mcclintock/ta2007/bin/blobtools/test_files/blast.out
[STATUS]        : Computing taxonomy using taxrule(s) bestsum
[PROGRESS]      :       100%
[STATUS]        : Parsing bam0 - /home/mcclintock/ta2007/bin/blobtools/test_files/mapping_1.bam
[STATUS]        :       Checking with 'samtools flagstat'
Traceback (most recent call last):
  File "./bloblib/create.py", line 115, in <module>
    main()
  File "./bloblib/create.py", line 108, in main
    blobDb.parseCoverage(covLibObjs=cov_libs, no_base_cov=None)
  File "/home/mcclintock/ta2007/bin/blobtools/bloblib/BtCore.py", line 347, in parseCoverage
    base_cov_dict, covLib.reads_total, covLib.reads_mapped, read_cov_dict = BtIO.parseBam(covLib.f, set(self.dict_of_blobs), no_base_cov)
  File "/home/mcclintock/ta2007/bin/blobtools/bloblib/BtIO.py", line 384, in parseBam
    reads_total, reads_mapped = checkBam(infile)
  File "/home/mcclintock/ta2007/bin/blobtools/bloblib/BtIO.py", line 198, in checkBam
    reads_secondary = int(reads_secondary_re.search(output).group(1))
AttributeError: 'NoneType' object has no attribute 'group'

Thanks for your help!

chriskraus0 commented 7 years ago

I encountered the very same bug. But it is not a bug of the blobtools but may very well be a bug of your samtool installation.

The problem lies in BtIO.py line 197 reads_mapped = int(reads_mapped_re.search(output).group(1)) here the python script reads the output of the command samtools flagstat mapping_1.bam in older versions of samtools (0.1.x) this command gives the following output: 37278 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 37278 + 0 mapped (100.00%:-nan%) 37278 + 0 paired in sequencing 18714 + 0 read1 18564 + 0 read2 28268 + 0 properly paired (75.83%:-nan%) 33716 + 0 with itself and mate mapped 3562 + 0 singletons (9.56%:-nan%) 2649 + 0 with mate mapped to a different chr 2578 + 0 with mate mapped to a different chr (mapQ>=5) The comparator which was initialized in BtIO.py for "secondary reads" but these are not in the output of samtools flagstat. Thus, the variable reads_mapped can not be initialized and python throws an error.

The same command with samtools version 1.3 (as recommended on https://blobtools.readme.io/docs/getting-started ) gives the following output: 15313 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary 216 + 0 supplementary 0 + 0 duplicates 15313 + 0 mapped (100.00% : N/A) 15097 + 0 paired in sequencing 7554 + 0 read1 7543 + 0 read2 13356 + 0 properly paired (88.47% : N/A) 14320 + 0 with itself and mate mapped 777 + 0 singletons (5.15% : N/A) 340 + 0 with mate mapped to a different chr 298 + 0 with mate mapped to a different chr (mapQ>=5)

See here we have the "secondary reads" and everything works just fine.

Bottom line just install samtools v1.3 perferably from this source: http://www.htslib.org/download/

Alternatively the creators of this great package could also include a fail-safe check for the installed samtools version.

Best, chris

DRL commented 7 years ago

Hi,

@chriskraus0 thanks for that excellent and detailed post!

yes, that is the root of the problem.

Will put in a version check as soon as I have some free time. That function needs some work either way...

@chriskraus0 : btw, do you have an opinion on default samflags to filter a BAM file on, currently it BAM files get filtered with

samtools view -f 1 -F 1024 -F 256 -F 2048

And maybe I should put that in the samtools flagstat command as well so I could avoid parsing secondary-read-counts alltogether.

cheers,

dom

tarunaaggarwal commented 7 years ago

Thanks Chris and Dom! Appreciate your help! Samtools issue is fixed for now :-)

Maya0801 commented 7 years ago

Hi,

I have the same issue.

I ran this command: blobtools map2cov -i /db/ana/programs/blobology/trinity_petrosia_complete_short_names.fasta -b /db/ana/programs/blobology/trinity_petrosia_complete_short_names.fasta.readsname.bowtie2.bam -o out

And this is the output:

[STATUS] : Parsing FASTA - /db/ana/programs/blobology/trinity_petrosia_complete_short_names.fasta [STATUS] : Parsing bam0 - /db/ana/programs/blobology/trinity_petrosia_complete_short_names.fasta.readsname.bowtie2.bam [STATUS] : Checking with 'samtools flagstat' Traceback (most recent call last): File "/usr/local/src/blobtools/blobtools-0.9.19.5/bloblib/map2cov.py", line 52, in main() File "/usr/local/src/blobtools/blobtools-0.9.19.5/bloblib/map2cov.py", line 49, in main blobDb.parseCoverage(covLibObjs=cov_libs, no_base_cov=no_base_cov_flag) File "/usr/local/src/blobtools/blobtools-0.9.19.5/bloblib/BtCore.py", line 347, in parseCoverage base_cov_dict, covLib.reads_total, covLib.reads_mapped, read_cov_dict = BtIO.parseBam(covLib.f, set(self.dict_of_blobs), no_base_cov) File "/usr/local/src/blobtools/blobtools-0.9.19.5/bloblib/BtIO.py", line 384, in parseBam reads_total, reads_mapped = checkBam(infile) File "/usr/local/src/blobtools/blobtools-0.9.19.5/bloblib/BtIO.py", line 198, in checkBam reads_secondary = int(reads_secondary_re.search(output).group(1)) AttributeError: 'NoneType' object has no attribute 'group'

I created the bam file using this comman

bowtie2 -x trinity_petrosia_complete_short_names.fasta --very-fast-local -k 1 -t -p 12 --reorder --mm \ -U /db/ana/Paired-end/petrosia_PE_suff.fastq \ | samtools view -S -b -T trinity_petrosia_complete_short_names.fasta - >trinity_petrosia_complete_short_names.fasta.readsname.bowtie2.bam

I was wandering what are the numbers in the command above represent:

samtools view -f 1 -F 1024 -F 256 -F 2048

and if I can use the same numbers in my command?

Thanks for the help,

Maya

DRL commented 7 years ago

Hi Maya, this is most likely related to the version of samtools you are using (as Chris pointed out). BlobTools uses samtools flagstat to infer how many reads there are in order to display a progress bar during parsing. I just made a new release in which this problem should be fixed. Please download the newest release.

Regarding the numbers in the command:

samtools view -f 1 -F 1024 -F 256 -F 2048

these are samflags and you can get a feel for what they do on here or consulting the sam-format specification

What they mean in detail is : "Give me records of reads that are"

can use the same numbers in my command?

I am not sure how to answer this. Technically, you can. Philosophically, it depends of the question(s) you want to answer with your BAM file(s)... it is not required for BlobTools, if this is what you mean.

Hope that helps.

Can please you report back if the fix worked for you?

cheers,

dom

Maya0801 commented 7 years ago

Hi Don,

I think I have that lastest vesrion. blobtools v0.9.19

when I changed my bam file to sam file it worked fine.

But I get the same error when I try the to filter the bam file using blobtools bamfilter \ -b /db/ana/programs/blobology/trinity_petrosia_complete_short_names.fasta.readsname.bowtie2.bam \ -i petrosia_tokeep2.txt

I would like to point out that I did not use paired end data, may that cause the problem?

cheers,

Maya

On Wed, Mar 29, 2017 at 8:13 PM, Dominik R Laetsch <notifications@github.com

wrote:

Hi Maya, this is most likely related to the version of samtools you are using (as Chris pointed out). BlobTools uses samtools flagstat to infer how many reads there are in order to display a progress bar during parsing. I just made a new release in which this problem should be fixed. Please download the newest release.

Regarding the numbers in the command:

samtools view -f 1 -F 1024 -F 256 -F 2048

these are samflags and you can get a feel for what they do on here https://broadinstitute.github.io/picard/explain-flags.html or consulting the sam-format specification https://samtools.github.io/hts-specs/SAMv1.pdf

What they mean in detail is : "Give me records of reads that are"

  • paired (-f 1)
  • NOT a PCR or optical duplicate (-F 1024)
  • NOT a secondary alignment (-F 256)
  • NOT a supplemetary alignment (-F 2048)

can use the same numbers in my command?

I am not sure how to answer this. Technically, you can. Philosophically, it depends of the question(s) you want to answer with your BAM file(s)... it is not required for BlobTools, if this is what you mean.

Hope that helps.

Can please you report back if the fix worked for you?

cheers,

dom

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DRL/blobtools/issues/37#issuecomment-290195466, or mute the thread https://github.com/notifications/unsubscribe-auth/AMG9v5SyOhgOsNuQoIdg5fF2FOKjKxmCks5rqq1ngaJpZM4KPReO .