UW-GAC / GENESIS

GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
https://bioconductor.org/packages/GENESIS
34 stars 13 forks source link

Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed #66

Closed vkp3 closed 3 years ago

vkp3 commented 3 years ago

Hello,

I have had several issues that mimic the error (below) when trying to run GENESIS on a particular chromosome. It says there's some issue with an if statement, as stated below. In addition, it states some NA's were introduced.

Everything else seems to run fine and I have R compiled with Intel MKL libraries for matrix calculations, but for certain chromosomes / tests, I get the following error. It has been difficult to pinpoint the root of the error, but I am wondering if the authors or others could shed light on this error and how it may be avoided. I also wonder if the warning and the error are related.

...
Iteration 704 of 1086 completed
Iteration 715 of 1086 completed

Error in if (any(i < 0L)) { : missing value where TRUE/FALSE needed
Calls: assocTestAggregate ... .local -> .meanImpute -> [<- -> [<- -> [<- -> [<- -> int2i

In addition: Warning message:
In int2i(as.integer(i), n) : NAs introduced by coercion to integer range
Execution halted

Thank you, Vamsee

smgogarten commented 3 years ago

I had not heard of this error before, but it's been reported elsewhere and diagnosed as a problem with the matrix being too large. Perhaps one of your aggregate units contains too many variants; can you try splitting up the largest ones and see if that fixes the problem?

vkp3 commented 3 years ago

Thank you for your response and for the link. Your suggestion that the number of variants might be too large seems accurate. My aggregate unit is a gene, and I am testing all coding sequence variants < MAF 0.01 (no additional filters) from exome sequencing data from the UK Biobank (N=~180,000) for any given gene-based aggregate test. So, I suppose the matrix of [variants x samples] for one of the genes might cause issues with matrix calculations.

Could you elaborate on how I would split the largest ones - do you mean split the gene into two aggregate units and run them separately as different units? If so, is there 1) a way to combine the units together after running each split, and 2) is there some way the code could skip over any units that fail to run such that the other genes can finish? I also wonder if these issues were faced with the TOPMED data if the sample sizes are comparable to the UK Biobank data.

GENESIS has been a joy to use otherwise, so kudos to your quick response.

smgogarten commented 3 years ago

We were able to reproduce this error for a matrix with >2^31 elements, and added a fix in version 2.23.3 of GENESIS.

vkp3 commented 3 years ago

Thanks for the bug fix. I am unfortunately still experiencing an error that seems to be due to the same issue.

For a given gene (TTN), I am attempting to include 13,349 variants across 180256 individuals. I created a single range SeqVarRangeIterator defined by:

> iterator <- SeqVarRangeIterator(seqData, variantRanges=GRanges(seqnames=c(2),ranges=IRanges(start=178525989, end=178830802), strand='-'), verbose=T)
# of selected variants: 13,349
> assoc <- assocTestAggregate(iterator, nullmod.fe, test='SMMAT',genome.build='hg38',weight.beta=weight.beta,verbose=T)
# of selected samples: 180,256
Error in .local(x, y, ...) : negative length vectors are not allowed

As you can see, I get the following error:

Error in .local(x, y, ...) : negative length vectors are not allowed

which would indicate (from doing some research for others who've ran into similar errors) this is due to to the data frame size being > 2^31 -1 (https://stackoverflow.com/questions/42479854/merge-error-negative-length-vectors-are-not-allowed)

Could you please confirm you can reproduce this error?

Thanks again

smgogarten commented 3 years ago

I was able to reproduce this, still looking into the cause and possible fixes.

smgogarten commented 3 years ago

@mconomos the error is coming from this line: https://github.com/UW-GAC/GENESIS/blob/master/R/nullModelTestPrep.R#L53

tcrossprod(x, y) where x and y are both dgeMatrix objects with 1 column each, and nrow(x) * nrow(y) > 2^31

smgogarten commented 3 years ago

Fixed in 5d6a7ef and 49a2215