Closed rjcorb closed 1 year ago
I tried to run the script but I am getting the following error:
Could not parse format string: %CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%\n
It is pointing to line 24. The column headers do match, so I am not sure whats going on...
I tried to run the script but I am getting the following error:
Could not parse format string: %CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%\n
It is pointing to line 24. The column headers do match, so I am not sure whats going on...
Should run now
Yes, that worked!
Purpose/implementation Section
What feature is being added or bug is being addressed?
This PR adds
parse_vcf.sh
bash script to parse all columns and info fields from VCF file into tab-separated columns for downstream workflow output generation.What was your approach?
parse_vcf.sh
first extracts all INFO subfields and formats into a single character string to be used as input forbcftools query
.bcftools query
is used to parse all vcf columns and INFO subfields from vcf file, and output is written as*.parsed.vcf
.What GitHub issue does your pull request address?
73
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Please run script using the test input vcf file as follows:
bash parse_vcf.sh input/test-parsing.vcf
Which areas should receive a particularly close look?
Ensure script runs and that output looks as expected (tab-delimited, one column per INFO subfield)
Is there anything that you want to discuss further?
The file column names currently start with
[<column no.>]
. I think this would be easier to remove in a subsequent R script in which this file will be merged with AutoGVP Rscript output.Documentation Checklist