browning-lab / flare

The flare program performs local ancestry inference
31 stars 7 forks source link

feat: pull samples from ref-panel #2

Closed troycomi closed 2 years ago

troycomi commented 2 years ago

rfmix will only consider samples for references that are defined in the reference panel. This is useful for using the same reference vcf with different runs.

It appears flare does not accept having any gt samples in the ref vcf or having a sample in the ref vcf that is not in the ref-panel. Could you allow unused samples in the ref vcf (including those also present in gt)? It saves a lot of time and temporary files for some workflows.

Another related feature would be to define a list of samples for gt instead of a vcf. You could have a main vcf file with all samples then just pass different text files for different analyses.

browning-lab commented 2 years ago

Thank you for these great suggestions. I just posted version 0.2, which allows you to specify a subset of reference samples and subset of study samples to be analyzed. Version 0.2 has the following two changes:

1) Only reference samples in the ref-panel file are included in the analysis. Any samples in the reference file that are not in the ref-panel file will be ignored. 2) The optional "excludesample" parameter has been replaced with the optional "gt-samples" parameter. The gt-samples parameter allows you to specify the list of admixed study samples that are to be analyzed.

If your reference and admixed study samples are combined in the same VCF file, you do not need to separate the reference and admixed study samples into two VCF files if you are using version 0.2. Instead, you can

a) create a file with the list of study samples (one sample per line), b) set the ref and gt parameters equal to the (same) combined VCF file, and c) set the gt-samples parameter equal to the file with the list of study samples.

troycomi commented 2 years ago

Great change! Cut our analysis time effectively in half to avoid making temporary vcfs.

With a few samples I'm getting identical outputs. Thanks again.