SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Gene filter is really slow when there are massive numbers of variants in DB #589

Open davmlaw opened 2 years ago

davmlaw commented 2 years ago

This works by returning a list of variants that overlap with the genes - perhaps we're calling list() on a queryset? if the DB is huge, this is really slow.

Was running:

SELECT "annotation_variantgeneoverlap"."variant_id" FROM "annotation_variantgeneoverlap" WHERE "annotation_variantgeneoverlap"."gene_id" IN (SELECT U5."gene_id" FROM "genes_releasegenesymbol" U0 INNER JOIN "genes_genesymbol" U1 ON (U0."gene_symbol_id" = U1."symbol") INNER JOIN "genes_genelistgenesymbol" U2 ON (U1."symbol" = U2."gene_symbol_id") LEFT OUTER JOIN "genes_releasegenesymbolgene" U5 ON (U0."id" = U5."release_gene_symbol_id") WHERE (U2."gene_list_id" IN (5715) AND U0."release_id" = 2))

Mendeliome took 147s to filter.

davmlaw commented 2 years ago

Possibly we could do a filtered relation join to annotation_variantgeneoverlap, benchmark and see if that's faster