PeteHaitch / Lister2BAM

A collection of Python scripts to convert Lister-style alignment files from Lister et al. Nature (2009 and 2011) to BAM format.
MIT License
0 stars 0 forks source link

@ in QNAME #3

Open PeteHaitch opened 10 years ago

PeteHaitch commented 10 years ago

bismark strips the "@" character from the start of the read name (QNAME). The FF-iPSC_19.11+BMP4 and H1+BMP4 samples have a "@" character at the start of all read names but my scripts don't remove these. This causes problems if trying to process this sample with bismark_methylation_extractor because it treats all these lines as header lines.

This could be fixed in Bismark by opening all SAM/BAM files with samtools view rather than samtools view -h, however, it's not a bug as far as Bismark is concerned since it will never arise for users who align and process their data with Bismark.

It will only be a bug for users who "fake" the Bismark alignment (like I do with these scripts), and then try to process them bismark_methylation_extractor.

PeteHaitch commented 10 years ago

Can "fix" this by, e.g.:

samtools view -H H1+BMP4.bam > tmp.sam
samtools view QS_H1+BMP4.bam | sed "s/^\@//" - >> tmp.sam
samtools view -bS tmp.sam > tmp.bam