caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
98 stars 25 forks source link

circular mitochondrial genome #2

Closed caleblareau closed 6 years ago

caleblareau commented 6 years ago

the mitochondrial DNA is circular / plasmid like.

Basically, we need a workflow that creates a surrogate second mitochondrial chromomsome that wraps, say, the last 50 BP of the chromosome to the first 50 bp. This should be made into its own chromosome. Then, a new reference genome build for the favorite tool has to be made.

For mgatk purposes, we need something intelligent to process 2 chromosome .fasta files of mitochondrial chromosomes that is also sensitive to multi-mapping when filtering the .bam file. And finally, variant quantification has to be more intelligent to handle the multiple chromosome, etc.

caleblareau commented 6 years ago

misc. gsnap stuff that seems to be the right answer:

gmap_build -d hg19chrM hg19.fasta -c chrM

gsnap --gunzip -D /apps/lab/aryee/gmap-2018-02-12/share -d hg19chrM miseq/fastq/Bulk1_R1.fastq.gz miseq/fastq/Bulk1_R2.fastq.gz -A sam | samtools view -Sbh - | samtools sort -@ 4 - -o miseq_bulk1.gsnap.bam

https://github.com/juliangehring/GMAP-GSNAP http://research-pub.gene.com/gmap/

XC: Indicates whether the alignment crosses over the origin of a circular chromosome. If so, the string XC:A:+ is printed.

samtools view miseq_bulk1.gsnap.bam | grep XC:A:+

caleblareau commented 6 years ago

Implemented in gsnap where there is now documentation in the documents.