genome / gms

The Genome Modeling System installer
https://github.com/genome/gms/wiki
GNU Lesser General Public License v3.0
78 stars 23 forks source link

name sorting of bam files #197

Closed shu2010 closed 8 years ago

shu2010 commented 8 years ago

Dear authors, Hope you all had a great break!! I am encountering issues while running ref-align with in-house samples. Error: 2015-12-15 19:16:15+0000 gms-larg: ERROR: Before and after name-sort resulted in different number of reads: 965966742 <=> 290753805

I was hoping these issues were taken care of during the BAM sanitization process while importing the samples.

Cheers Shu

gatoravi commented 8 years ago

This error seems to originate from here

The sorting seems to happen in Genome::Model::Tools::Sam::SortBam here

Sorting is done with samtools sort -n, I don't think we've seen this before. Any idea why you'd have fewer reads after name-sorting your BAM? I can't think of a different way of identifying the issue other than manually name-sorting your BAM file and see if you get the same number of reads as reported in the error. If samtools somehow errored out while sorting, the module should have caught that return value and reported it in the log.

shu2010 commented 8 years ago

Yeah, I did exactly that. But, the flagstat results were identical for both name sorted and unsorted bam file. 965966742 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 0 + 0 mapped (0.00%:-nan%) 965966742 + 0 paired in sequencing 482983371 + 0 read1 482983371 + 0 read2 0 + 0 properly paired (0.00%:-nan%) 0 + 0 with itself and mate mapped 0 + 0 singletons (0.00%:-nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

gatoravi commented 8 years ago

Hi, Speaking to developers here, disk issues like running out of free space could cause errors like the ones you are seeing. I did notice that you mentioned having a lot of disk space on #198...wonder if the two issues are related somehow..