diskin-lab-chop / AutoGVP

19 stars 3 forks source link

retain all INFO fields when parsing #93

Closed rjcorb closed 1 year ago

rjcorb commented 1 year ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

This script updates parse_vcf.sh so that all INFO fields are retained as columns in output. Previously, only INFO fields present in first row of VCF were extracted, which was resulting in some fields with no values in that row being missed.

What was your approach?

Extracted all INFO field names across all rows and retained only unique values to be used as input for bcftools query

What GitHub issue does your pull request address?

89

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Run parse script on test file as follows:

bash parse_vcf.sh test-parsing.vcf

Which areas should receive a particularly close look?

Confirm that there are 53 columns in tsv output, which corresponds to the number of unique info fields.

Is there anything that you want to discuss further?

No

Documentation Checklist