Open pjotrp opened 4 years ago
These are the existing switches
GEMMA 0.98.3 (2020-09-29) by Xiang Zhou and team (C) 2012-2020
SNP QC OPTIONS
-miss [num] specify missingness threshold (default 0.05)
-maf [num] specify minor allele frequency threshold (default 0.01)
-hwe [num] specify HWE test p value threshold (default 0; no test)
-r2 [num] specify r-squared threshold (default 0.9999)
-notsnp minor allele frequency cutoff is not used
GEMMA has a simplistic poly filter which simply removes genotype rows that are identical, i.e. carry one single genotype. No reason to make that optional.
I added some documentation in above commit. Essentially disable above filters with
gemma -r2 1.0 -hwe 0 -miss 1.0 -notsnp ...
Computing the GRM modifies the genotypes (after applying above filters) before computing:
As you can generate your own GRM to load in GEMMA there is probably no point in disabling these.
Hi, After specifying flags for filtering I still miss SNPs number of total SNPs/var = 53211 number of analyzed SNPs/var = 47054
this is the command line Command Line Input = gemma -g imputed_genotypes.mgf.gz -lmm 1 -k kinship.cXX.txt.gz -maf 0 -hwe 0 -miss 0 -r2 1 -notsnp -p phenotype.phe -outdir gemma_test -o test
below there is the log
thank you, Mauro
GEMMA Version = 0.98.1 (2018-12-10) Build profile = GCC version = 8.2.0 GSL Version = 2.5 Eigen Version = 3.3.5 OpenBlas = OpenBLAS 0.3.2 - DYNAMIC_ARCH NO_AFFINITY Sandybridge MAX_THREADS=6 arch = Sandybridge threads = 6 parallel type = threaded
Command Line Input = gemma -g imputed_genotypes.mgf.gz -lmm 1 -k kinship.cXX.txt.gz -maf 0 -hwe 0 -miss 0 -r2 1 -notsnp -p phenotype.phe -outdir gemma_test -o test
Date = Thu Apr 22 11:50:18 2021
Summary Statistics: number of total individuals = 7898 number of analyzed individuals = 5537 number of covariates = 1 number of phenotypes = 1 number of total SNPs/var = 53211 number of analyzed SNPs/var = 47054 REMLE log-likelihood in the null model = -7658.62 MLE log-likelihood in the null model = -7659.27 pve estimate in the null model = 0.412219 se(pve) in the null model = 0.0256184 vg estimate in the null model = 1.32772 ve estimate in the null model = 0.587324 beta estimate in the null model = -0.000792455 se(beta) = 0.0102992
Computation Time: total computation time = 22.0958 min computation time break down: time on eigen-decomposition = 3.49347 min time on calculating UtX = 5.42422 min time on optimization = 6.84549 min
I don't have time to look into it now, but do note that gemma is pretty logical and dropping genotypes has its reasons.
Gemma has a number of filters, including maf, miss, R2 and also conversions, such as centering which need the option of disabling.