Adding the "is bcf simple" checks made the process of verifying the BCF take ~15.32 seconds (edit: ok, around 12-16 seconds, maybe) on Bloom, as opposed to ~0.24 seconds from a few days ago (before I added all those checks).
In my view, this tradeoff is 100% worth it -- better slow and correct than fast and wrong. But it'd be nice to speed things up.
Some ideas:
If the input BCF was produced by strainFlye, just take a leap of faith and assume it's OK (only use strict validation on outside inputs)
Depending on how many contigs there are in the dataset, parallelize the checks across contigs
... and there are probs other ideas that would also work.
Adding the "is bcf simple" checks made the process of verifying the BCF take ~15.32 seconds (edit: ok, around 12-16 seconds, maybe) on Bloom, as opposed to ~0.24 seconds from a few days ago (before I added all those checks).
In my view, this tradeoff is 100% worth it -- better slow and correct than fast and wrong. But it'd be nice to speed things up.
Some ideas:
... and there are probs other ideas that would also work.