gbradburd / SpaceMix

29 stars 10 forks source link

Added a script to convert vcf to sapcemix inputs. #4

Open StuntsPT opened 8 years ago

StuntsPT commented 8 years ago

This script makes it a lot easier to get those input files. It converts directly from vcf to something that spacemix can read directly. I hope it helps.

peterdfields commented 8 years ago

This script seems to work well! Maybe a modification could be made to better deal with variation in vcf format? Rather than hard coding the line that contains vcf column names, just detect the line that contains them for the downstream conversion?

StuntsPT commented 8 years ago

That should be an easy to make improvement (I'm thinking of grep). Good idea. I'll implement that as soon as I have a moment.

StuntsPT commented 8 years ago

The reason I was using head instead of grep was simple performance. "Grepping" over huge VFC files will take a while, but I learned something new in the process: grep -m X will stop after the "Xth" match, which effectively solves the performance problem, no matter how large the VCF is. Thanks for calling it out.