broadinstitute / tensorqtl

Ultrafast GPU-enabled QTL mapper
BSD 3-Clause "New" or "Revised" License
162 stars 52 forks source link

How to get all significant phenotype-variants pairs in cis-QTL permutations mapping mode? #151

Closed xiaohe0404 closed 1 month ago

xiaohe0404 commented 2 months ago

When I used cis.map_cis() function to do cis-QTL permutation analysis, I found only the most significant phenotype-variants pairs were output. I really wanna get all variants that significant associated with a phenotype after permutation, but I didn't find any parameter that related to this in cis.map_cis() function. I really wander did I miss something, or what can I do? I am really looking forward to your prompt help.

JamesTanShengYi commented 1 month ago

I want to ask the exact same question for the cluster version of the implementation. For my application (drosophila, were the environment is tightly controlled), my supervisor and collaborators find it acceptable to include more than just the top gene-SNP pair and we were thinking of using beta pval <0.05 as a significance cutoff.

We also have the space to store such large pvalue table outputs.

JamesTanShengYi commented 1 month ago

I want to ask the exact same question for the cluster version of the implementation. For my application (drosophila, were the environment is tightly controlled), my supervisor and collaborators find it acceptable to include more than just the top gene-SNP pair and we were thinking of using beta pval <0.05 as a significance cutoff.

We also have the space to store such large pvalue table outputs.

I found that tweaking cis.map_cis() within the cis.py script to output all SNPs and permutations rather than just the top SNP called 'ix', and then tweaking calculate_cis_permutations() to include for loop for each SNP can do the trick.

francois-a commented 1 month ago

There's a function specifically designed for this: post.get_significant_pairs. You first need to run both cis.map_cis and cis.map_nominal. Then:

chr_files = {os.path.basename(f).split('.')[2]:f
             for f in glob.glob('/.../{prefix}.cis_qtl_pairs.*.parquet')}  # outputs from map_nominal
cis_df = pd.read_csv('/.../{prefix}.cis_qtl.txt.gz, sep='\t', index_col=0)  # output from map_cis
signif_df = post.get_significant_pairs(cis_df, chr_files, fdr=0.05)