Closed plantformatics closed 1 year ago
Hi, tensorQTL will filter out monomorphic variants if any are present, but in general the output should be identical to fastQTL. Are you able to provide a small test dataset that reproduces this behavior?
I could send a vcf and plink2 binaries for a single phenotype (along with a covariates file) - does this work?
Edit: I should also mention that the plink2 binaries were developed from the VCF which only had dosage information present.
@francois-a I sent a collaboration request for a repo containing a subset of data and readme with additional info. Since the main difference between fastQTL and tensorQTL input is the use of plink2 for genotype information, my guess is that the plink2 conversion may be what's responsible. I rarely use plink so I have limited understanding of how the conversion works.
Edit: What is the order of operations for MAF filtering and genotype imputation? When I disable MAF filter on tensorQTL (but still require maf_threshold 0.05 for fastQTL), num_var
now more comparable, but still not identical.
Thanks for sharing the test dataset! The issue is related to missing values — in the current tensorQTL implementation, it is assumed that genotype dosages don't contain missing values; only missing hard calls are imputed (to the mean). So it's not a PLINK2 issue per se, but rather that the missing values (which are read as -9) are not properly handled.
MAF filtering is performed after imputation.
Makes sense - thanks for taking a look. This is super helpful!
First off, I'd like to say thank you for developing this software, I've been blown away by the speed! I am trying to figure out why the number of surveyed variants in tensorQTL is drastically lower than fastQTL.
num_var
for any given phenotype is ~1/4 the value from fastQTL. I used identical parameters (Maf < 0.05) and loaded dosages in both fastQTL (vcf) and tensorQTL (Plink2 psam/pgen/pvar). Are there hard-coded variant filters being applied to tensorQTL that are absent in fastQTL? I get drastically fewer QTL being identified as a result and I'd like to understand why.