Graph Interpretation - Githubissues

Illumina / REViewer

A tool for visualizing alignments of reads in regions containing tandem repeats

GNU General Public License v3.0

73 stars 14 forks source link

Graph Interpretation #27

Closed hchetia closed 2 years ago

hchetia commented 2 years ago

Hi, I have couple of questions about interpretation of REViewer graphs.

1- How are base substitutions and deletions displayed in the graphs generated by REViewer?

2- What does this dark line signify?

3- Some graphs have three reads aligned (the one in between are IRRs). How reliable are genotypes supported by such three reads (given that the flanks are 100% matching)?

4- Why are some reads of the same color but lighter in appearance?

Thank you, Hasna

hchetia commented 2 years ago

Waiting for your reply @ctsa @egor-dolzhenko 🙂

egor-dolzhenko commented 2 years ago

Hi Hasna,

Thanks for the questions and sorry for the slow reply. I am on leave at the moment.

If a read has a mismatching base to the haplotype sequence than this base is displayed (as shown in your screenshots). Deletions are shown by thin horizontal lines and insertion are shown by thick vertical lines.

I'd say that repeats supported by just three reads are not very reliable. At regular genome-wide sequencing depths (30x or more) we expect each base of the repeat to be covered by many reads.

Reads shown in a lighter color are ambiguously assigned. (When a read pair can be aligned equally well to either haplotype, REViewer assigns the read pair randomly.)

Best wishes, Egor

hchetia commented 2 years ago

Hi @egor-dolzhenko thank you so much for the reply. It helped me interpret my results better.

Reg. my question no.3, actually what I meant was the three reads in one row (two spanning reads and one IRR). If I am correct in assuming that IRRs can be actually placed anywhere within a repeat region by the graph, then how reliable would this genotype prediction be (please see the figure)? Or should we rather manually inspect the no. of repeats covered by the spanning reads only to predict a relatively more accurate genotype? I have the same doubt about genotype predicted with one spanning and one IRR (cov. >10).

Regards, hasna

egor-dolzhenko commented 2 years ago

Hi Hasna,

Yes, IRRs can be placed anywhere within the repeat. REViewer randomly spreads out IRRs (and other ambiguously-aligning reads) throughout the repeat in an attempt to generate a pileup where all repeat alleles have similar coverage.

In my opinion this is not a high-confidence genotype call because it is supported by so few reads. What is the genome-wide sequencing depth for this sample? Also, are you working with PCR-free or PCR+ data? If you are working with PCR+ data, it's possible that some longer repeats are not getting amplified as well.

Apologies again for the slow replies! I will do my best to reply much faster next time.

Best wishes, Egor

hchetia commented 2 years ago

Hi Egor, thanks for the reply. I was of the same opinion. Do you think for such confusing repeats, we should rather manually count the no. of repeats covered by the spanning reads to get a GT call? The sequencing depth for these samples is around 10x and the data was derived using a PCR-free WGS approach.

Regards, Hasna

egor-dolzhenko commented 2 years ago

Oh, I see.. If the sequencing depth is around 10x, then each repeat allele would be sequenced at 5x depth and so the low coverage that you see in the plot is not surprising (but still, all repeat expansion calls are less confident at such low coverages). Do you have a way to validate if this particular expansion is real?

And yes, you could always count the number of repeats in reads to get a lower bound for the repeat size.

I hope this helps!

Best wishes, Egor

hchetia commented 2 years ago

Yes, we will carry do targeted sequencing of the loci.

Thanks Egor. You are always very helpful.

egor-dolzhenko commented 2 years ago

Great! Happy to help!