Open mcovarr opened 3 days ago
the correctness comparisons I mentioned are between your subcohort BGE extracts from the WGS 3k callset pulling from ah_var_store and this branch. And in theory you can also look at memory usage between them and document it to see if the code affects sub-region extracts as well as subcohort extracts (although those results will not gate this PR being merged)
I did do the BGE correctness comparisons mentioned above and everything tied out perfectly wrt ah_var_store, dropping > 99% of filter set info and > 98% of filter set sites. The runtimes of these extracts are even shorter than they were for the WGS dataset so the graphs are not going to be terribly informative. I'm thinking to reach out to see if we can run this code against a larger AoU dataset after the break.
Integration test run here.
Follows the bread crumbs in VS-493 to drop filter info and sites outside of variant locations for the samples being extracted.
Test runs here:
Findings: