clingen-data-model / clinvar-streams

1 stars 0 forks source link

Extract variation descendant ids into separate table #47

Closed theferrit32 closed 2 years ago

theferrit32 commented 2 years ago

On the full clinvar dataset with on the order of 500K-1M variations, the query to obtain the list of root variations needed to be published is very slow. I believe this is because of the way it checks for descendant variations, which at the moment is done substring like operator. https://github.com/clingen-data-model/clinvar-streams/blob/5a8385db1d5845391d2d1449f95f6c48ceec73ea/src/clinvar_combiner/combiners/core.clj#L87

This ticket is to add a linked table to variation and clinical_assertion_variation with the descendant ids.