alanrogers / covld

Estimate linkage disequilibrium between unphased loci
10 stars 4 forks source link

Can it handle missing data? #1

Open travc opened 9 years ago

travc commented 9 years ago

Can your code handle missing data (eg: a sample with no call at a loci)? It isn't in the example and I don't notice it anyplace obvious in the code. If the answer is "no", then that is a pretty bit limitation.

alanrogers commented 9 years ago

There is no provision for missing data. It would not be hard to add, however, using the EM algorithm. I would be happy to help with that, if you are interested.

travc commented 9 years ago

Thanks for the reply... Allowing missing data would be very useful (I'd argue that it should be a requirement for modern tools). Requiring all samples to be called at all sites ends up greatly reducing the number of sites which can be used when the sample sizes start to get reasonably large.

I've got a grant application deadline at the moment, but I'll take a look at the code when I get a chance.

alanrogers commented 9 years ago

Same here. I'm shooting for the July 5 NIH deadline and time is tight until then. Afterwards, I'd be happy to work on this.

With large samples, most of the time is spent in count_genotypes, which fills an array of 3 genotype counts. It would be easy to add a 4th cell to that array, representing missing values. Shouldn't affect execution time much. If the 4th cell is empty, we could call the existing functions. Otherwise, the call would go to some slower function that accommodates missing values.

On Mon, Jun 8, 2015 at 11:16 AM, Travis Collier notifications@github.com wrote:

Thanks for the reply... Allowing missing data would be very useful (I'd argue that it should be a requirement for modern tools). Requiring all samples to be called at all sites ends up greatly reducing the number of sites which can be used when the sample sizes start to get reasonably large.

I've got a grant application deadline at the moment, but I'll take a look at the code when I get a chance.

— Reply to this email directly or view it on GitHub https://github.com/alanrogers/covld/issues/1#issuecomment-110080342.