QuantGen / BGData

A Suite of Packages for Analysis of Big Genomic Data
Other
34 stars 14 forks source link

Problem with j in getG #55

Open kennaas opened 3 years ago

kennaas commented 3 years ago

Hi,

I'm using a genomic data set with 180K SNPs to create G matrices. I want to exclude around 3K of these SNPs from the calculation, so I used the j argument to include all other columns than these 3K. This gave a wildly different answer than when not excluding columns, with generally much larger entries than what we should be seeing.

This problem persisted even when I excluded just 1 single randomly selected SNP column (again, out of 180K SNPs, so this should have a large effect, right?). This also shows that the problem is not specific to the columns I wanted to exclude, but to excluding columns with j at all.

Luckily I was able to get around this issue by instead making a new BGData object where the unwanted SNP columns were not included in the first place, and then not specifying j in getG. This gave the expected/right answer, so there seems to be some problem with using j.

Thank you for the package, it is very helpful.

agrueneberg commented 3 years ago

Hi,

I cannot reproduce the problem on my machine. Can you try to show me what you did with the example BGData object that is bundled with the package?

library(BGData)
DATA <- BGData:::loadExample()
X <- geno(DATA)
G1 <- getG(X)
G2 <- getG(X, j = [...])

Alternatively, could you make code and dataset available to me?

Thank you, Alex

kennaas commented 3 years ago

I am also unable to recreate the error on the example data. As for sharing the dataset, I will have to get back to you. If I can not release it now, it should be available within a few months.

An off-topic PS: it might be nice to add a check on the value of chunkSize - I had accidentally set it to 0 when working with the example data, which resulted in a confusing error.