RajLabMSSM / leafcutter-pipeline

Differential splicing and visualisation pipeline for the Raj lab
2 stars 5 forks source link

Provided example bams are invalid #8

Open yfarjoun opened 2 years ago

yfarjoun commented 2 years ago

Hello,

using samtools v1.14 the bam supplied in the example seems to be invalid. Running the example code results in the following error:

[E::sam_hdr_sanitise] Malformed SAM header at line 2
samtools index: failed to create index for "example/data/control_scramble_3_unique_kcnq2.bam"

when manually inspecting the bam it seems that there's a missing @RG tag:

$ zless zless example/data/control_scramble_3_unique_kcnq2.bam   
BAM,@HD VN:1.3  SO:coordinate
ID:control_scramble_3   PL:Illumina
@SQ SN:chr1 LN:248956422
@SQ SN:chr2 LN:242193529
<SNIP>
yfarjoun commented 2 years ago

Looking at the PG line it seems that sugin the bam generation star aligner was run with the --outSAMheaderHD ID:control_scramble_2 PL:Illumina instead of using the --outSAMheaderRG argument....

from the star aligner PDF:

--outSAMattrRGline
default: -
    string(s): SAM/BAM read group line. The first word contains the read group
    identifier and must start with ”ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy
    ”DS:z z z”.
    xxx will be added as RG tag to each output alignment. Any spaces in the tag
    values have to be double quoted.
    Comma separated RG lines correspons to different (comma separated) input
    files in –readFilesIn. Commas have to be surrounded by spaces, e.g.
    –outSAMattrRGline ID:xxx , ID:zzz ”DS:z z” , ID:yyy DS:yyyy

--outSAMheaderHD
    default: -
    strings: @HD (header) line of the SAM header
yfarjoun commented 2 years ago

I can fix the bam files with a samtools reheader command:

e.g.:

zless example/data/TDP43_knockdown_1_unique_kcnq2.bam   | \
  sed -n '1{s/^.*@HD/@HD/p}; 2{s/^/@RG\t/p}; 3,/@CO/p' > TDP43_knockdown_1.header.sam
samtools reheader TDP43_knockdown_1.header.sam example/data/TDP43_knockdown_1_unique_kcnq2.bam > temp.bam
mv temp.bam example/data/TDP43_knockdown_1_unique_kcnq2.bam