OpenMendel / SnpArrays.jl

Compressed storage for SNP data
https://openmendel.github.io/SnpArrays.jl/latest
Other
44 stars 9 forks source link

different ways of constructing GRMs #120

Closed mmkim1210 closed 2 years ago

mmkim1210 commented 2 years ago

In the following example, shouldn't the two ways of constructing the GRM yield the same result? If that is the case, it seems like the numerical discrepancy is a tad bit large.

using SnpArrays, LinearAlgebra
# construct a classic GRM using all SNPs
kgp = SnpData(joinpath(pwd(), "chr1")) # 355290 SNPs
Φ = grm(kgp.snparray, method = :GRM)
Φ .*= 2
# construct a GRM in another way
storage = convert(Matrix{Float64}, kgp.snparray, impute = true, center = true, scale = true)
storage = storage * transpose(storage)
storage ./= (2 * size(kgp)[2])
storage .*= 2
# compare the two
norm(Φ - storage) # 0.15744930227624643
sum(Φ .≈ storage) # 2

chr1.zip

Hua-Zhou commented 2 years ago

@mmkim1210 Thanks for providing a test data.

By default, grm function excludes SNPs with minor allele frequencies (MAFs) < 0.01, because rare variants cause unstable estimates of kinship. This is controlled by the minmaf keyword argument.

In this example data, there are 233 SNPs with MAF < 0.01. They are excluded by default.

# how many SNPs have maf < 0.01?
kgp_maf = maf(kgp.snparray)
count(x -> x < 0.01, kgp_maf) # 233

If we run

# compute GRM using all SNPs
Φ = 2grm(kgp.snparray, method = :GRM, minmaf = 0.0)

then we have

norm(Φ - storage)

equal to 0.0.