What

Implement cross-validation like training pipelines/cv_training
- train the model on n-1 folds of the samples
- compute the burdens on the held-out fold using this models
- repeat this for all n-folds
- for this, allow a sample file to be passed to dense_gt.py
Allow different orders of samples in phenotype_df and genotypes.h5 in dense_gt.py
- so far, samples in phenotype_df and genotypes.h5 had to be in the same order.
- this is changed now by introducing an additional index map for the genotypes.h5, which retrieves samples in the order of self.samples
Average burdens from multiple repeats and run association testing afterwards (deeprvat/associate.py) (as opposed to running the association testing on each cv individually)
Restructure the snakefiles as it had been already started in the main branch
- require baseline result only for training phenotypes
Update evaluate.py
- no repeats required any more
- make bonferroni correction the default multiple testing correction
- don't combine baseline discoveries with DeepRVAT discoveries
- Re-test also the 'seed genes' since we don't evaluate on the same sample-gene combinations any more as we trained on (thanks to the cv-based training procedure)
use additional covariates age2 and age*sex and correct for statin usage
- updated the example data to have these fields
update example data to have bit sample ids in genotypes.h5 and string sample ids in phenotype_df
implement conditional analysis for common variants pipelines/association_testing_control_for_common_variants.snakefile

Testing

quite extensively tested (many reasonable experiments done) but still need to check why github tests fail so far

PMBio / deeprvat

Feature cv training #55

What

Testing