UCSF-DSCOLAB / data_processing_pipelines

A repository to store the existing pipelines to process the various CoLabs datasets
0 stars 1 forks source link

post-merge filtering of variants in bulk pipeline #52

Open erflynn opened 10 months ago

erflynn commented 10 months ago

Add a module to filter variants post-merge in bulk pipeline.

I've typically used the following:

vcftools --gzvcf ${MY_VCF} \
    --max-missing 1.0 \ # may want to lower this
    --min-alleles 2 \
    --max-alleles 2 \
    --remove-indels \
    --out snps_filtered \
    --recode --recode-INFO-all

We may also want to add a filter for common variants based on a reference. Historically this was done from genomics with --maf 0.05, but I believe this is calculated based on the dataset, so it would be highly sample-size dependent, and would not accomplish what we are hoping for.

erflynn commented 10 months ago

tagging @dtm2451 @tastam because we are discussing