corbinq / apex

Toolkit for QTL mapping and meta-analysis.
https://corbinq.github.io/apex/
16 stars 1 forks source link

Pval is -1 when running apex cis #15

Open Jia21 opened 2 years ago

Jia21 commented 2 years ago

Hi, I recently ran apex cis with LMM with GRM files, there were some results with Pval =-1, and all the other statistics were 0s. The command, results and log file are shown below. Could someone please help to explain?

The command I used was apex cis --vcf chr19.vcf.gz \ --grm grm_file \ --bed adjust_exp_pcs_pfs.bed.gz \ --window 1000000 \ --long \ --prefix chr19_cis

The results are shown below 19 7961303 GCCTTCCATCAGAAGGGTGTTGGCTCCCA G gene1 0 0 -1 19 7961303 GCCTTCCATCAGAAGGGTGTTGGCTCCCA G gene2 0 0 -1

The log file is shown as Set window size to 1 Mbp. Using 1 threads. chr19 present in both bcf and bed file. 346714 total variants on selected chromosomes.

            Found 1000 samples in bcf file ... 

            WARNING: No covariate file specified. That's usually a bad idea.
                Covariates can be specified using --cov FILE. Use --rankNormal
                to rank-normal (aka, inverse-normal) transform traits, and use
                --rankNormal-resid for trait residuals.
            Found 1000 samples in expression bed file ... 
            Found 1000 samples in common across all three files.

            Processed expression for 1238 genes across 1000 samples.
            Processed genotype data for 346714 variants.

            Found 1 related blocks with 1000 total individuals (largest block = 1000) ... 
            Reordering related GRM blocks ... Done.
            Decomposing block 0 (1000) out of 1 ... Done
            .Creating eigenvector matrix ... Done.
            reordering ... Done.
            Selected 1000 eigenvectors.  Scaling expression traits ... 
            Calculating partial rotations ...
            Done.
            Calculating genotype-covariate covariance ... 
            Calculating genotype residual variances ...Done.
            Calculating genotype variance anchor points (2/3) ... 
            Calculating genotype variance anchor points (3/3) ... 
            Done.

Since the expression has been corrected with covariates, covariates were not included in the command. Thank you!

Cheers, Elaine

hsun3163 commented 2 years ago

Hi Jia, I was having a similar error, except that all of my beta and se are -nan instead of 0, as shown below:

#chrom  pos     ref     alt     gene    beta    se      pval
12    10674     G       A       IQSEC3  -nan    -nan    -1
12    10700     G       A       IQSEC3  -nan    -nan    -1
12    10879     C       G       IQSEC3  -nan    -nan    -1
12    13783     CAG     C       IQSEC3  -nan    -nan    -1
12    14888     C       T       IQSEC3  -nan    -nan    -1
12    16268     G       A       IQSEC3  -nan    -nan    -1
12    18052     G       A       IQSEC3  -nan    -nan    -1

The log of my running is shown below:

Using 48 threads.
chr12 present in both bcf and bed file.
553011 total variants on selected chromosomes.

Found 1194 samples in bcf file ...
Found 415 samples in covariate file ...
Found 424 samples in expression bed file ...
Found 415 samples in common across all three files.

Processed data for 86 covariates across 415 samples.
Processed expression for 780 genes across 415 samples.
Processed genotype data for 553011 variants.

Using OLS (no GRM specified).
Started cis-QTL analysis ...
Scaling expression traits ...
Calculating genotype-covariate covariance...
Calculating genotype residual variances ...Done.
Calculating expression residuals...
Scaling expression residuals ...
Processed 120 cis-QTL blocks out of 120 total
hsun3163 commented 2 years ago

Another factor that leads to the all nan result is the inclusion of the covariate files. As it turn out, removing the covariates allow the program to produce correct-looking OLS results.

Following this path, the problem is further narrowed down to the factor file generated by APEX factor program. Using only the known covariates + PCs as cov file didn't lead to nan results.

Jia21 commented 2 years ago

Hi,

Thanks! The covariates were not included in the APEX cis since the gene expressions have been corrected using all known covariates + PCs + PEER factors, so I don't think the covariates are the reason leading to this issue. But any discussion is welcome, any other thoughts?

Thank you! Elaine

jeffchen2000 commented 1 year ago

hi Jia and all others I had the same procedures as what you did and I also have tried all suggestions mentioned here, none of them seems work, below are what I got, so can I simply ignore the p value -1?

11 1133595 A C MUC5AC -16.5165 -nan -1 11 1133761 G T MUC5AC 78.1805 -nan -1 11 1133976 C G MUC5AC 56.1279 -nan -1 11 1134260 C G MUC5AC 56.1279 -nan -1 11 1134507 A C MUC5AC 56.1279 -nan -1 11 1135591 T C MUC5AC 56.1279 -nan -1 11 1135753 G GA MUC5AC 31.9459 -nan -1 11 1135753 G GAA MUC5AC -1.52115 0.78328 0.0576674 11 1135813 G A MUC5AC 56.1279 -nan -1 11 1136047 C CAT MUC5AC -1.53111 1.23219 0.2197 11 1136067 TATACAC T MUC5AC -118.509 -nan -1 11 1136079 T C MUC5AC -114.411 -nan -1 11 1136105 T C MUC5AC 56.5308 -nan -1 11 1136143 CAT C MUC5AC 67.3717 -nan -1 11 1136180 ACATATATAATT A MUC5AC -118.923 -nan -1