brittneybrinsfield / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Headers are not always correctly parsed #84

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

$ samtools view -h raw_map.bam | head -2
@SQ SN:chr1 LN:1000
XXX0000 0   chr1    100 170 171M    *   0   0   CCTCTTTTATAGCAGCATGAACCCGTCGGTTTTAAACACGGTGTCA
GCGGTGGCCTCGGAAGTATGGCCCATTCGCGAACGATGCCGTCCCTTTCAAGACACCTGAGACACGGGATCTTTATCGCC
CGTGAGACCCACACACCACGCTTTCCCGTTTCTTTGTACGATTTC   *   AS:i:170    MS:i:50

$ python -c "import pysam; print pysam.version.__version__; 
s=pysam.Samfile('raw_map.bam'); print s.header"
0.6
{}

What is the expected output? What do you see instead?

A dictionary containing the headers show in samtools view 

What version of the product are you using? On what operating system?

0.6 on linux, OS X 

Original issue reported on code.google.com by cas...@gmail.com on 23 Jan 2012 at 3:54

Attachments:

GoogleCodeExporter commented 9 years ago
This could potentially be bounced upstream.  I created the raw_map.bam with 
'samtools view -T ref -Sb'

If I reheader it with its own header, then it works with pysam.  You would 
think this wouldn't change the file: 

$ samtools view -H raw_map.bam > h
$ samtools reheader h raw_map.bam > remap.bam
$ md5 raw_map.bam remap.bam 
MD5 (raw_map.bam) = a270e60a662c77de23b1a14dde7fcb94
MD5 (remap.bam) = a55dc41ca9b3f293be9eebabbd130fd0

$ samtools view -h raw_map.bam > l
$ samtools view -h remap.bam > r
$ diff l r
shows nothing

Original comment by cas...@gmail.com on 27 Jan 2012 at 11:17

GoogleCodeExporter commented 9 years ago
No-one else sees this?  It's getting old quickly!

Original comment by cas...@gmail.com on 10 May 2012 at 9:49

GoogleCodeExporter commented 9 years ago
Hi,

it seems that the 'header' in raw_map.bam is for all purposes empty (l_text = 
0).

However, the target sequences are stored in a separate field and correctly 
present:

python -c "import pysam; print pysam.version.__version__; 
s=pysam.Samfile('raw_map.bam'); print s.header; print s.references"

{}
('chr1',)

For pysam, the reference names are not part of the header field. 

I don't know what samtools reheader does, but it might enter reference names 
into the textual part of the header structure and thus they will show both in 
the
header and in the reference names.

What is the correct behaviour?

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 7 Jul 2012 at 9:04

GoogleCodeExporter commented 9 years ago
I can use "samtools view -H /Volumes/exFAT/data/sorted.bam > ~/header.txt" to 
get the header as attached. 
But if I use pysam, I only get empty header and references. 
{}
()

Original comment by shenzh...@gmail.com on 15 Aug 2012 at 8:54

Attachments:

GoogleCodeExporter commented 9 years ago
I find the problem. 
samfile = pysam.Samfile( "ex1.bam", "r" )

Everything works if I use
samfile = pysam.Samfile( "ex1.bam")

Original comment by shenzh...@gmail.com on 15 Aug 2012 at 9:09

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Fixed. If there are only SQ lines in a header, they are explicitely
added to the dictionary that is returned by s.header.

Many thanks for flagging this,
Andreas

Original comment by andreas....@gmail.com on 20 Nov 2012 at 10:17