TeraStructure is a new algorithm to fit Bayesian models of genetic variation in human populations on tera-sample-sized data sets (10^12 observed genotypes, i.e., 1M individuals at 1M SNPs). This package provides a scalable, multi-threaded C++ implementation that can be run on a single computer.
As can be seen in the log below, terastructure is having issues with a dataset I'm trying to use it on. While computing the held-out likelihood, the gamma function overflows, I think? The result seems to occur with this data at any reasonable k (2<=k<=12), with rfreq of 1, 2 and 3 million SNPs, and with any seed. I've confirmed the shape of the dataset is correct.
This is with GSL 2.3, using the intel compiler suite on 64 bit Centos 6.
As an aside, I've managed to get this dataset to "work" using plink BED files, however it gives nonsense results: For all k > 2, all samples have theta of 0.999995 for one particular population. This suggests to me that read_bed and read_012 are reading differently (the dataset is identical).
Hi,
As can be seen in the log below, terastructure is having issues with a dataset I'm trying to use it on. While computing the held-out likelihood, the gamma function overflows, I think? The result seems to occur with this data at any reasonable k (
2<=k<=12
), with rfreq of 1, 2 and 3 million SNPs, and with any seed. I've confirmed the shape of the dataset is correct.EDIT:
This is with GSL 2.3, using the intel compiler suite on 64 bit Centos 6. As an aside, I've managed to get this dataset to "work" using plink BED files, however it gives nonsense results: For all k > 2, all samples have theta of 0.999995 for one particular population. This suggests to me that
read_bed
andread_012
are reading differently (the dataset is identical).