Utilizing dosage input to encode a non-genetic 'genotype'?

vkp3 commented 4 years ago

Hello,

I posted this issue in @francois-a's package fastqtl, but since tensorqtl is similar, I'm writing the same question here:

I'm attempting to replicate the gtex-pipeline using scripts from the gtex-pipeline repository on GitHub.

However, my use case is a bit different than eQTL mapping. I would like to run a phenotype-QTL mapping, where: instead of many single-variant association tests, I want to test a general phenotype and its association to expression of many genes amongst tissues.

Most importantly, I would like to achieve fast performance on permutation testing for computing p-values of such a phenotype and its association with gene expression (across genes).

My intuition is to encode the phenotype as a dosage for a single genetic variant, across individuals, but I am unsure if this is supported by FastQTL and/or TensorQTL.

So, I'm wondering if this could be possible? If so, could you help me encode this strategy for FastQTL or TensorQTL? I would like to utilize the programs' fast performance in terms of permutation testing.

Thank you

francois-a commented 4 years ago

Hi,

Yes, this is definitely possible. The phenotypes and genotypes can be any continuous value, and all you'll need to do is specify a 'variant' mapping (in variant_df) that links your 'genotype' values to a phenotype via the cis-window.

Here's an example of how you could do this:

import pandas as pd
import numpy as np
from tensorqtl import cis

np.random.seed(12345)

n = 100 # samples
m = 20  # genotypes
genotype_df = pd.DataFrame(np.random.rand(m, n))
phenotype_df = pd.Series(np.random.rand(n), name='phenotype_1').to_frame().T
phenotype_pos_df = pd.DataFrame({'chr':['chr1'], 'pos':[1]}, index=['phenotype_1'])
variant_df = pd.DataFrame({'chrom':['chr1']*m, 'pos':np.arange(m)}, index=genotype_df.index)

# permutations
cis_df = cis.map_cis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, window=1000000)

# nominal associations
cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, 'test',
                output_dir='.', window=1000000)

You'll need to use the latest commit if not including any covariates as in this example (I'll make a new release soon).

Edit: fix for phenotype_pos_df change ('pos' instead of 'tss').

CuteGold0407 commented 10 months ago

hi， this code result the error：KeyError: 'start'，I'm wondering if there is another way to “encode the phenotype as a dosage for a single genetic variant, across individuals", in order to map pheno to pheno in the new version? thanks!

francois-a commented 10 months ago

Please try again with the updated code above, it should work now.

CuteGold0407 commented 10 months ago

thanks! i fix that，ecoding the phenotype following the updated code solves this problem！

broadinstitute / tensorqtl

Utilizing dosage input to encode a non-genetic 'genotype'? #19