clemente-lab / mmeds-meta

A database for storing and analyzing omics data
https://mmeds.org
2 stars 1 forks source link

Improve time complexity of meta analysis queries #434

Open adamcantor22 opened 1 year ago

adamcantor22 commented 1 year ago

Is your feature request related to a problem? Please describe. The MetaAnalysisView that is used to create meta studies has to run a large number of joins on essentially the entire SQL schema. This is a clunky solution, and takes some time. Currently, with ~18,000 samples in MMEDS, it takes about 5 minutes to do this. With that number of samples, we should be able to come up with a solution that takes next to no time at all.

Describe the solution you'd like Queries should take in the order of a few seconds. One solution I've considered is dynamic view generation, where we generate a view only based on what columns exist in the query. Streamlining the wording of the JOINs could also help, would appreciate Matt's help taking a look at those at some point.

Additional context We should also check whether the query time increases linearly or exponentially with more samples. A linear relationship would be far less urgent, as a few minutes for a query that only we have access to is not that bad. An exponential relationship would be more worrying.