Clinical-Genomics / MIP

Mutation Identification Pipeline. Read the latest documentation:
https://clinical-genomics.gitbook.io/project-mip/
MIT License
42 stars 10 forks source link

Explanation called SNV - but not seen in IGV??? #2021

Closed KickiLagerstedt closed 1 year ago

KickiLagerstedt commented 1 year ago

Variant are called - but with low score, many reads per allele, not shown in IGV

https://scout.scilifelab.se/cust002/F0046529-3/23d2fc25374c8c4ee1d51012dd1ca3c7#comments

https://scout.scilifelab.se/cust002/F0046529-3/3746556ee54a7ab08acac68520632b3d

KickiLagerstedt commented 1 year ago

@helena.malmgren has seen the same regarding F0014934-2, ALMS1-gene

northwestwitch commented 1 year ago

For those 2 cases you posted above the variant is shown in the squished view:

image image

I'm not sure why, could be due to downsampling?

dnil commented 1 year ago

Hi, for starters, the allele counts in Scout are as reflected on the input VCF. Screenshot 2023-03-23 at 10 48 06 Screenshot 2023-03-23 at 10 48 20 We also see traces of the variants called on the IGV.js view, but nowhere near either the depth or evenness hinted by the AD and DP, but certainly not the variant Quality.

I do not see any downsampling indicator for these regions in IGV.js, but lets give it a quick look in desktop IGV before passing on to MIP. At least the FLG2 is a rather repetitive story, so DeepVariant may be doing something creative there, with comparing different regions at one go, but it is not something I was aware of.

northwestwitch commented 1 year ago

I do not see any downsampling indicator for these regions in IGV.js, but lets give it a quick look in desktop IGV before passing on to MIP. At least the FLG2 is a rather repetitive story, so DeepVariant may be doing something creative there, with comparing different regions at one go, but it is not something I was aware of.

No I agree, it's not enough reads for downsampling. I don't understand

dnil commented 1 year ago

I can check in desktop IGV, but given that there are known repeats here, and that the alignments look rather messy and with low quality variants, some SA and XA tagged reads etc, I suspect we are dealing with an issue where DeepVariant through local realignment has brought in more reads than were originally at this loqus with bwa (which is what we see in the cram file, and visualise with IGV).

Compare perhaps e.g. this issue: https://github.com/google/deepvariant/issues/618, and the FAQ, https://github.com/google/deepvariant/blob/r1.5/docs/FAQ.md, especially the diff between bwa aln and realignment. Realignment in general improves the DeepVariant results, but can get a bit confusing when visualising the initial alignments - much as for TIDDIT v3 and onwards.

dnil commented 1 year ago

It does look very similar on the desktop version:

Screenshot 2023-03-23 at 11 21 39

Let's pass this on to MIP and e.g. @ramprasadn or @jemten to see if they could perhaps find out if the region is indeed one being realigned, and ideally a post-realign image from a recent run from the region? I think that would be rather illustrative. And ofcourse just check so that it is actually a feature of DeepVariant not a bug! 😊

jemten commented 1 year ago

This is how DeepVariant see this region (https://scout.scilifelab.se/cust002/F0046529-3/23d2fc25374c8c4ee1d51012dd1ca3c7#comments) after local realignment.

image
dnil commented 1 year ago

It seems rather clear that neither representation is quite true to the underlying molecular one, but at least here the counts are very similar, and we can perhaps interpret this variant as being on two out of four regions pretty similar to this.

The quality value is low for a reason.

jemten commented 1 year ago

Yeah GQ=9 is a clear indicator that this is a messy region. How do you wanna proceed with this? I don't think it's very feasible to save the local realignment files for display in scout.

dnil commented 1 year ago

Agreed!

I'm personally good with this explanation. The counts match the realigned variants, not the original bwa alignment. It's not a region where I would feel very confident about short read calls with respect to which repeat copy has which variant, and the quality reflects this. What do you say @KickiLagerstedt?

I definitely think we want realignment with DeepVariant, so turning it off just to simplify the view of some actually very difficult regions is hardly on the table. Is there some kind of indicator in the VCF from DeepVariant showing that it used a realignment to make this particular call, beyond the low GQ?

jemten commented 1 year ago

I can't see any indication of that in the VCF. I don't think there is much we can do about this on the MIP side.

KickiLagerstedt commented 1 year ago

OK - Helena and I accept this!

dnil commented 1 year ago

Ok closing this - thank you @jemten!