broadinstitute / gatk-protected

Obsolete/Legacy GATK repository -- go to https://github.com/broadinstitute/gatk instead
BSD 3-Clause "New" or "Revised" License
33 stars 20 forks source link

Generative model of coverage for germline #402

Closed davidbenjamin closed 8 years ago

davidbenjamin commented 8 years ago

Generative models with the raw data as an observed node have many advantages over approaches in which the raw data is pre-processed. We would like a generative model relating observed read counts to hidden copy number state.

This will be easiest to try first on our germline code, for two significant reasons:

If this succeeds, it would then not be so hard to surmount those obstacle for somatic calling, but there is no good reason not to do the simpler thing first.

davidbenjamin commented 8 years ago

Note: this will close issue #371 (fast PCA) because probabilistic PCA models are best learned via a fast EM algorithm.

davidbenjamin commented 8 years ago

Note: this is able to issue #381 (deal with copy number events in the panel of normals / cohort) because if copy ratio / copy number is part of the generative model then learning the parameters of the model will take that into account.

davidbenjamin commented 8 years ago

Closed by PR #416. Now there are several tickets for the implementation.