As is, the tool requires that variants IDs (both in the input --eqtl file, and --vcf file) be formatted <chrom>_<pos>.... As far as I can tell, there are two reasons for this:
It allows the program to parse the variant's position from its ID.
It meets the formatting requirements used by lmfit in the fitting step.
I am proposing changes that make the tool agnostic to the format of the variant IDs (I can imagine some users have VCFs that use dbSNP rsIDs, for example). Briefly, the changes are as follows:
the --eqtl file now must include two additional columns: variant_chr and variant_pos that describe the (1-based) position of each variant. This information is then used to fetch the genotypes from the tabix-indexed VCF
Variants are assigned unique temporary IDs (a new variant_id_clean column) that meet the formatting requirements of lmfit and are used when fitting the model.
I've also updated the gene_id_clean functionality to match that of the new variant_id_clean column. This assumes no specific formatting of the input gene IDs.
As is, the tool requires that variants IDs (both in the input
--eqtl
file, and--vcf
file) be formatted<chrom>_<pos>...
. As far as I can tell, there are two reasons for this:lmfit
in the fitting step.I am proposing changes that make the tool agnostic to the format of the variant IDs (I can imagine some users have VCFs that use dbSNP rsIDs, for example). Briefly, the changes are as follows:
--eqtl
file now must include two additional columns:variant_chr
andvariant_pos
that describe the (1-based) position of each variant. This information is then used to fetch the genotypes from the tabix-indexed VCFvariant_id_clean
column) that meet the formatting requirements oflmfit
and are used when fitting the model.gene_id_clean
functionality to match that of the newvariant_id_clean
column. This assumes no specific formatting of the input gene IDs.