StoreyLab / terastructure

TeraStructure is a new algorithm to fit Bayesian models of genetic variation in human populations on tera-sample-sized data sets (10^12 observed genotypes, i.e., 1M individuals at 1M SNPs). This package provides a scalable, multi-threaded C++ implementation that can be run on a single computer.
GNU General Public License v3.0
48 stars 9 forks source link

GSL overflow computing heldout likelihood #14

Open kdm9 opened 7 years ago

kdm9 commented 7 years ago

Hi,

As can be seen in the log below, terastructure is having issues with a dataset I'm trying to use it on. While computing the held-out likelihood, the gamma function overflows, I think? The result seems to occur with this data at any reasonable k (2<=k<=12), with rfreq of 1, 2 and 3 million SNPs, and with any seed. I've confirmed the shape of the dataset is correct.

$ terastructure \
>     -label rfreq1M-012 \
>     -rfreq 1000000 \
>     -file plink/nooutlier.012 \
>     -n 123 \
>     -seed 1234 \
>     -l 8579612 \
>     -k 5 \
>     -nthreads 16
+ rfreq = 1000000
+ using file plink/nooutlier.012
+ n = 123
+ L = 8579612
+ K = 5
+ Creating directory n123-k5-l8579612-rfreq1M-012
+ Writing log to n123-k5-l8579612-rfreq1M-012/infer.log
+ done initializing env
+ .012 detected+ reading (123,8579612) snps from plink/nooutlier.012
8570000 locations read+ initialization begin
+ computing initial heldout likelihood
gsl: gamma.c:1489: ERROR: overflow
done:0.00%Default GSL error handler invoked.
Aborted

EDIT:

This is with GSL 2.3, using the intel compiler suite on 64 bit Centos 6. As an aside, I've managed to get this dataset to "work" using plink BED files, however it gives nonsense results: For all k > 2, all samples have theta of 0.999995 for one particular population. This suggests to me that read_bed and read_012 are reading differently (the dataset is identical).

ArpiS commented 1 year ago

Hi, I am also getting this same error and am wondering if you were able to resolve this issue? Any advice or fixes would be greatly appreciated.