Open david4096 opened 8 years ago
Don't see a way in pysam to get number of rows. Here's something temporary to get an idea.
for sample.vcf.gz (2052 variants, 3 samples = 6156 calls):
output to pb 9 seconds
output to txt 9 seconds
output to DB (alongside either pb or txt) 27 seconds
Being able to predict how long a file will take to process would be helpful. This might be arrived at after testing, however, taking some measurement of the width (number of samples) and height (number of variants) will give us a good idea of how long it will take.
Take a single record from the variant set and try to make adjustments to the predicted time based on the number of samples to be processed.
Use pysam API to get the total number of rows to give estimates. It would be great to have some control of "how much time per GB".