broadinstitute / tensorqtl

Ultrafast GPU-enabled QTL mapper
BSD 3-Clause "New" or "Revised" License
162 stars 52 forks source link

Documentation for interaction & conditional analyses #11

Closed JonMarten closed 4 years ago

JonMarten commented 4 years ago

Hi there,

Back again with more questions. I'm trying to run an analysis with an interaction term but I'm not entirely clear on how this is meant to be run.

First I tried following the --interaction flag the column name of the variable to be used as an interaction term from my covariate file. This obviously failed.

I then tried passing the name of a new text file that contained columns for sample ID and the interaction term. When run in the command line, this failed with an AsserionError:

Feb 05 14:33:24] Running TensorQTL: cis-QTL mapping
  * using GPU (Tesla P100-PCIE-16GB)
  * reading phenotypes (/rds/project/jmmh2/rds-jmmh2-projects/interval_rna_seq/analysis/03_tensorqtl/phenotypes/INTERVAL_RNAseq_phase1_filteredSamplesGenes_TMMNormalised_FPKM_Counts_foranalysis.bed.gz)
  * reading covariates (/rds/project/jmmh2/rds-jmmh2-projects/interval_rna_seq/analysis/03_tensorqtl/covariates/INTERVAL_RNAseq_phase1_age_sex_rin_batch_PC10_PEER20.txt)
  * reading interaction term (/rds/project/jmmh2/rds-jmmh2-projects/interval_rna_seq/analysis/03_tensorqtl/covariates/INTERVAL_RNAseq_phase1_GxE_neutPCT.txt)
Traceback (most recent call last):
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/tensorqtl.py", line 70, in main
    assert covariates_df.index.isin(interaction_s.index).all()
AssertionError

As far as I can tell, this is checking that the indices for the covariate and interaction dataframes match, and this comes up with an error. The confusing thing is that when I read both of these files into python interactively and perform the same assert test, it passes, so the ids are definitely consistent between the files.

I finally tried running the whole command interactively in python with the following code:

cisnom_df = cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df, prefix="Test_gxe", interaction_s=interaction_df)

Which gave another different error:

cis-QTL mapping: nominal associations for all variant-phenotype pairs
  * 2745 samples
  * 17674 phenotypes
  * 22 covariates
  * 6811432 variants
  * including interaction term
    * using 0.05 MAF threshold
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/cis.py", line 108, in map_nominal
    interaction_mask_t = torch.BoolTensor(interaction_s >= interaction_s.median()).to(device)
ValueError: could not determine the shape of object type 'DataFrame'

Do you have any idea what I might be doing wrong here?

francois-a commented 4 years ago

Hi,

The interaction term needs to be provided as a file mapping sample ID to value (with an optional header) when using the CLI, otherwise as a pandas series.

Apologies for the lack of documentation for these functions. I've added descriptions to the README, in the section for running as a module, and will update the rest soon.