SouthGreenPlatform / arcad-hts

Scripts for the analysis of high-throughput sequencing data from the ARCAD project.
http://www.arcad-project.org/
GNU General Public License v3.0
0 stars 1 forks source link

Add tags in header of BAM from arcad-hts_4_mapping #16

Open timflutre opened 9 years ago

timflutre commented 9 years ago

More specifically, I would like to add (optional) options to have AS, M5 et SP in @SQ (see the official SAM specifications).

ps: I will try to do it myself

timflutre commented 9 years ago

I just cloned the repo, created a new branch "4map_bam_head", and pushed it. More details here.

timflutre commented 9 years ago

I just asked a question on Biostar.

timflutre commented 9 years ago

Here is how I see things:

  1. add an option allowing the user to provide a SAM header with full @SQ tags;
  2. replace the @SQ tags from the BAM file returned by BWA, by the header file provided by the user.

Here is how to make the SAM header file:

java -Xmx4g -jar `which picard.jar` CreateSequenceDictionary REFERENCE=myref.fa.gz OUTPUT=myref_header.sam GENOME_ASSEMBLY=v2 SPECIES="Bla bla" TRUNCATE_NAMES_AT_WHITESPACE=true

And here is how I would code step (2), which should be done inside the script arcad-hts_4 mapping.pl:

bamF="out_bwa.bam"
mv $bamF tmp_$bamF
cat <(samtools view -H tmp_$bamF | grep -v "@SQ") <(grep "@SQ" myref_header.sam) > tmp_header.sam
samtools reheader tmp_header.sam tmp_$bamF > $bamF
rm tmp_header.sam tmp_$bamF