Arc39 / vcf_to_ga

converts VCF into GA4GH protobuf messages
Apache License 2.0
2 stars 1 forks source link

Time measurements #4

Open david4096 opened 8 years ago

david4096 commented 8 years ago

Being able to predict how long a file will take to process would be helpful. This might be arrived at after testing, however, taking some measurement of the width (number of samples) and height (number of variants) will give us a good idea of how long it will take.

Take a single record from the variant set and try to make adjustments to the predicted time based on the number of samples to be processed.

Use pysam API to get the total number of rows to give estimates. It would be great to have some control of "how much time per GB".

Arc39 commented 8 years ago

Don't see a way in pysam to get number of rows. Here's something temporary to get an idea.
for sample.vcf.gz (2052 variants, 3 samples = 6156 calls): output to pb 9 seconds output to txt 9 seconds output to DB (alongside either pb or txt) 27 seconds