broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 588 forks source link

Ported from Classic-GATK: Unclipping of high quality soft-clips results in difficult to explain GVCF non-variant likelihoods #269

Open vdauwera opened 9 years ago

vdauwera commented 9 years ago

Originally from @vruano

Follow up from story https://www.pivotaltracker.com/story/show/68368324

Currently HC unclips hight quality soft-clip in order to aid the discovery of variation thru the local de-novo assembly and calculation of read vs haplotype likelihoods.

As a side effect processes that default to the original alignment (e.g to calculate HOM REF confidenced based on the pileup) take the unclipped ends as if it is the bona-fine aligned to that section of the reference.

In occasions this seems to be beneficial as it cast genuine doubt on overlapped HOM-REF calls… however some other times it has the opposite effect that casting doubt of genuine hom-ref calls just because they are close to a medium-large insertion that explain well the soft-clips.

The task is to tackle the issue making the soft-clips counts agains hom-ref only when these are not well explained by real variation. This may required to make the RCM to fully used the graph to calculate the HOM-REF likelihoods as initially intended (if fail through due to change in priorities) instead of defaulting to the original alignment (with unclipped soft-clips).

akiezun commented 9 years ago

@vdauwera is this a bug or enhancement?

vdauwera commented 8 years ago

Bug. It creates inaccurate ref calls. Test case is available in https://github.com/broadinstitute/gsa-unstable/issues/1271

pgrosu commented 8 years ago

Hi Geraldine,

Sorry to bother, but when I try to following this link, I get a 404 error:

https://github.com/broadinstitute/gsa-unstable/issues/1271

Thanks, Paul

vdauwera commented 8 years ago

Hi Paul, that means you don't have access to our internal repositories. Let me see if I can get you access.

pgrosu commented 8 years ago

Thank you :)

vdauwera commented 8 years ago

I added you as collaborator, you should have access now.

pgrosu commented 8 years ago

Thank you Geraldine.

vdauwera commented 7 years ago

Hey all, @vruano / @davidbenjamin, is this on your radar at all? I heard rumblings about a rewrite of the assembly machinery; would that address this?

davidbenjamin commented 7 years ago

@vdauwera not on my radar.

vruano commented 7 years ago

@vdauwera not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases.

vdauwera commented 7 years ago

Ok, thanks for letting me know. We've been getting user complaints about this, FYI.

chlangley commented 7 years ago

Hello:

Thanks for info.

I don’t suppose it is my role to militate/plead for the solid fix of the this omission.

But I must say that it would be appreciated and in its own way advance science.

Thanks for any consideration.

Cheers, Chuck

On 10/Feb/2017, at 9:45 AM, Valentin Ruano Rubio notifications@github.com wrote:

@vruano not in my radar either.... the rewrite of the assembly may fix some of these cases where there is actually some variation that we fail to detect (false negative) that would explain those soft clips. However I don't think that would fix all the cases.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

vdauwera commented 7 years ago

Hi Chuck, as always it's a prioritization problem. Our internal stakeholders haven't indicated that this is a significant problem for them (right @ldgauthier ?) so it's difficult for us to justify putting resources into it ahead of other work. But if we start getting a lot of demand from external users to address this (especially if there is a well-documented impact on a research use case), then we could potentially reevaluate its priority.

chlangley commented 7 years ago

Thanks for getting this cleared up.
OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics.

I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might.

ldgauthier commented 7 years ago

This is probably affecting some of the GWAS studies but in subtle ways that haven't popped up yet. I'm cc'ing Andrea in the hopes that he has some time to think about the issue. I'd need some uninterrupted time to work out the details and that's hard to come by at the moment.

On Feb 11, 2017 12:21 AM, "chlangley" notifications@github.com wrote:

Thanks for getting this cleared up. OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics.

I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/269#issuecomment-279122551, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRhdNaqeg_h2KxcxGULyoiSO3D8EY9eks5rbUVogaJpZM4DrC8o .

ldgauthier commented 7 years ago

Hi Laura, hope you are enjoying your maternity leave! Unfortunately i will not have time to look into this, since I’m writing up a paper. cheers,

On Feb 12, 2017, at 4:17 PM, Laura Gauthier gauthier@broadinstitute.org wrote:

This is probably affecting some of the GWAS studies but in subtle ways that haven't popped up yet. I'm cc'ing Andrea in the hopes that he has some time to think about the issue. I'd need some uninterrupted time to work out the details and that's hard to come by at the moment.

On Feb 11, 2017 12:21 AM, "chlangley" <notifications@github.com mailto:notifications@github.com> wrote: Thanks for getting this cleared up. OK, what next? I'll check with colleagues who may be aware this 'feature'. Perhaps the case can be made more clearly by a group of users, including visible labs working on human evolutionary genomics.

I don't know the CA genomics community well, but my shallow poling suggests most are happily unaware that SNPs near indels will often be assigned lower quality than they might.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/gatk/issues/269#issuecomment-279122551, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRhdNaqeg_h2KxcxGULyoiSO3D8EY9eks5rbUVogaJpZM4DrC8o.

vdauwera commented 7 years ago

@chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork.

chlangley commented 7 years ago

Yes. That has potential. Let me consider your suggestion.

chlangley commented 7 years ago

Hello Geraldine:

On 1/Mar/2017, at 7:56 PM, Geraldine Van der Auwera notifications@github.com wrote:

@chlangley One thing we could potentially do to attract attention to this issue and solicit feedback from the community would be to feature it on the GATK blog. If you were to write a concise case study detailing the impact of the problem on your results, others may be motivated to look at their own results, and if it causes problems there, add their voices to yours. We're willing to bring this to public attention, we just don't have the bandwidth to do the legwork.

I started to work on this a bit and found myself blocked.

At this point I have a simple question: The GATK blog is separate from the forum (?). When I am on the blog page I can’t seem to find a button to submit a new post. I must be missing something or the route to blog posting is only via the forum?

Sorry to bother you with such mundane question.

Cheers, Chuck

vdauwera commented 7 years ago

Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc.

chlangley commented 7 years ago

Sounds good.

Thanks, Chuck

On Mar 18, 2017, at 06:33, Geraldine Van der Auwera notifications@github.com wrote:

Hi Chuck, the GATK blog is set up to only accept posts from admins or moderators on the forum (or my team). If you're willing to write something up, we would do it as a guest post, where I would post the text on your behalf (with clear attribution to you). If you'd like to share a draft with us the easiest way to do it is through a google doc.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.