This PR adds example data for the preprocessing pipeline and instructions for testing the pipeline. There is also a small patch to the preprocessing script to support relative paths when excluding variants.
The simulated example data is located in example/preprocess.
To test the example data follow the instructions in the README:
Run the preprocess pipeline with example data
The vcf files in the example data folder was generated using fake-vcf (with some
manual editing).
hence does not contain real data.
ls -l workdir/preprocesed
total 48
-rw-r--r-- 1 user staff 6404 Aug 2 14:06 genotypes.h5
-rw-r--r-- 1 user staff 6354 Aug 2 14:06 genotypes_chr21.h5
-rw-r--r-- 1 user staff 6354 Aug 2 14:06 genotypes_chr22.h5
A new job is added to actions that first run a smoke test of the preprocessing pipeline and then the full preprocessing pipeline using the example data. The slowest part of running the example pipeline is downloading the fasta file. In github actions this step is cached.
This PR adds example data for the preprocessing pipeline and instructions for testing the pipeline. There is also a small patch to the preprocessing script to support relative paths when excluding variants.
The simulated example data is located in
example/preprocess
.To test the example data follow the instructions in the README:
Run the preprocess pipeline with example data
The vcf files in the example data folder was generated using fake-vcf (with some manual editing). hence does not contain real data.
A new job is added to actions that first run a smoke test of the preprocessing pipeline and then the full preprocessing pipeline using the example data. The slowest part of running the example pipeline is downloading the fasta file. In github actions this step is cached.
You can view the actions here: https://github.com/PMBio/deeprvat/actions/runs/5741091552