This is added as separate pipelines, training_association_testing_regenie.pipeline and association_testing_pretrained_regenie.pipeline
To support REGENIE, we create a BGEN file with a single "pseudovariant" per gene, with probabilities [1 - d, 0, d], where d is the DeepRVAT gene impairment score (between 0 and 1). This way, the dosage used by REGENIE is 2 * d (between 0 and 2). This is a trick suggested by @joellembatchou
Add support for both gene-specific and variant-specific annotations in DenseGTDataset, PaddedAnnotations, and SparseGenotype
Testing
Already done some pretty extensive testing. One thing is that the seed gene discovery pipeline currently fails, perhaps this branch is missing some changes from other branches, or the tests need to be updated.
What
training_association_testing_regenie.pipeline
andassociation_testing_pretrained_regenie.pipeline
[1 - d, 0, d]
, whered
is the DeepRVAT gene impairment score (between 0 and 1). This way, the dosage used by REGENIE is2 * d
(between 0 and 2). This is a trick suggested by @joellembatchouDenseGTDataset
,PaddedAnnotations
, andSparseGenotype
Testing