Different alignment of indels in DNA and RNA

nilsrudqvist commented 7 years ago

Hi,

This might be off-topic and not within your domain, but want to ask anyways.

I work with a mouse dataset. We use BWA for alignment of DNA seq, and STAR for RNA seq. I have an insertion of A: CA>CAA. BWA seem to prefer having the insertion at position 2, and STAR at position 3. Both are equally right (!) - but when I use bam-readcount to find info on variants in DNA seq and RNA seq, I run into trouble and I find my variants (sometimes) at different locations.

I am not sure how to workaround this. I do not align the data myself, our core facility does that. Do you have any suggestions on what to do here?

(Issue previously raised at https://github.com/griffithlab/pVAC-Seq/issues/293)

Thanks, Nils

ernfrid commented 7 years ago

Hi Nils,

That's a good question. I don't have any readymade hints to deal with this. It looks like the Biostars link @chrisamiller sent you indicates a tool to left align indels at the alignment level. This may be the least complicated solution to your problem (assuming it works) and I would probably start there. It may be expensive to do so on the entire BAM, but given that you have few candidates, I'd think you could subset your RNA BAM to only your regions of interest ± some buffer and run the tool on that.

Alternatively, you could convert the bam-readcount output to VCF in some form and then use a tool like vt (http://genome.sph.umich.edu/wiki/Vt) to normalize the VCF. I think this would be quite complicated. After attempting a rough description of what might have to happen (see below), I really wouldn't recommend going down that rabbit hole if you don't have to. I haven't attempted this, but if I were to try then I would do something like the following:

Make a VCF in a fashion similar to pseudocode below

For each indel in the bam-readcount line:
Record the current allele in the info field
If a deletion:
  grab the base before the bam-readcount position from the reference
  create a VCF entry at the position before the bam-readcount entry, with the reference base concatenated to the indel allele from bam-readcount as the reference and the reference base as the alt
elsif an insertion:
   create a VCF entry at the position of the bam-readcount entry, with the alt as the reference base concatenated to the indel allele from bam-readcount as the alt and the reference base as the reference base

Run through vt normalize
Convert back to bam-readcount output making sure to replace the indel allele in the bam-readcount output with the new allele from the VCF and adjusting coordinates properly.

nilsrudqvist commented 7 years ago

Hi Dave,

Thank you for your response! I will sit down with the bioinformatician next week to discuss the path forward. Will most likely try to find a way around this that is sustainable for future projects.

Also, thanks for making bam-readcount, i use it a lot and it's really helpful!

Best, Nils

ernfrid commented 7 years ago

Best of luck! Please consider replying here with your solution should you find a work around. I'm sure others will encounter the same problem.

You're quite welcome on bam-readcount. I'm really glad to hear you find it useful!

nilsrudqvist commented 6 years ago

Hi, this ticket can be closed. Our workaround was to use a kit to left-align the RNA-seq reads (or right-align, can't remember which...).

One question, how do i cite bam-readcount?

Thanks, Nils

ernfrid commented 6 years ago

bam-readcount remains unpublished and I'd suggest simply citing the github repo.

nilsrudqvist commented 6 years ago

Ok, will do! Just want to give credit to you guys for building this tool for us.

Nils-Petter Rudqvist, Ph.D.

On Feb 26, 2018 3:55 PM, "Dave Larson" notifications@github.com wrote:

bam-readcount remains unpublished and I'd suggest simply citing the github repo.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/genome/bam-readcount/issues/40#issuecomment-368647067, or mute the thread https://github.com/notifications/unsubscribe-auth/ATfvyAaoCCpXG8Kechu-zGViUVcN_KLFks5tYxpJgaJpZM4MwIu8 .

genome / bam-readcount

Different alignment of indels in DNA and RNA #40