Closed ericsobel closed 8 years ago
A good point. I’ll fix it.
On Jul 8, 2016, at 1:03 PM, Eric Sobel notifications@github.com wrote:
A trivial edge case issue: In either summarize() function if m = 0 (i.e., the SnpArray is empty), then each calculation of maf includes a divide by zero. I suggest simply making those statements conditional on m > 0.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/OpenMendel/SnpArrays.jl/issues/4, or mute the thread https://github.com/notifications/unsubscribe/AEwHsSKr5U3uGSkADgIvTP2rwVqqXpV2ks5qTq0QgaJpZM4JIWhh.
One a second thought, the cases m=0 or a column having all missing genotypes do produce NaN for maf. Try
s = SnpArray(0, 5)
summarize(s)
gives
([NaN,NaN,NaN,NaN,NaN],Bool[false,false,false,false,false],[0,0,0,0,0],Int64[])
This is a sensible answer to me: maf cannot be calculated in these cases. Is this better to keep the current code?
I see your point. Of course if there are no genotypes, then which is the minor allele is also unknown (rather than the "allele2" implied by maf == NaN). I don't have a strong feeling about it, but I'd think with no genotypes, the minor (and the major) allele frequency should be 0.0 (since the count of alleles is zero), and the minor_allele boolean could be true or false. Edge cases certainly can be ambiguous. If you want to leave the code as is, I'm OK with that. (I've now rewritten Ken's code where he used summarize with a possibly empty SnpArray.)
I would follow the convention for regular array:
a = randn(0, 5)
mean(a)
produces
NaN NaN NaN NaN NaN
That means no change to the current summarize
functions, which output NaN
for maf
if input SnpArray has 0 rows or some columns have all missing genotypes.
A trivial edge case issue: In either summarize() function if m = 0 (i.e., the SnpArray is empty), then each calculation of maf includes a divide by zero. I suggest simply making those statements conditional on m > 0.