Open StuntsPT opened 8 years ago
This script seems to work well! Maybe a modification could be made to better deal with variation in vcf format? Rather than hard coding the line that contains vcf column names, just detect the line that contains them for the downstream conversion?
That should be an easy to make improvement (I'm thinking of grep). Good idea. I'll implement that as soon as I have a moment.
The reason I was using head
instead of grep
was simple performance. "Grepping" over huge VFC files will take a while, but I learned something new in the process:
grep -m X
will stop after the "Xth" match, which effectively solves the performance problem, no matter how large the VCF is.
Thanks for calling it out.
This script makes it a lot easier to get those input files. It converts directly from vcf to something that spacemix can read directly. I hope it helps.