JeffersonLab / rcdb

Run Condition DataBase
3 stars 6 forks source link

database search with multiple criteria is very slow #111

Open nsjarvis opened 1 week ago

nsjarvis commented 1 week ago

I'm using the python interface. I find that when I add a 4th condition to the selection string, the query becomes very much slower, so much so that if I run it over the whole run range, it looks dead.

Would it be possible to streamline the queries under the hood, or have it run each part of the query over the results of the previous part, or is there a way for me to do that easily myself in python?

Please try my example, it is really surprising how much slower it becomes when I use the full query string, and here I am only searching over a small number of runs.

import rcdb db = rcdb.RCDBProvider("mysql://rcdb@hallddb/rcdb2") selection = "@is_production and not @is_empty_target and cdc_gas_pressure >=99.8" runs = db.select_runs(selection,30274, 30300) selection = "@is_production and not @is_empty_target and cdc_gas_pressure >=99.8 and cdc_gas_pressure<=100.2" runs = db.select_runs(selection,30274, 30300)

nsjarvis commented 1 week ago

I tried the same thing in the gui and it gives a proxy error.

https://halldweb.jlab.org/rcdb/runs/search?runFrom=30274&runTo=31057&q=%40is_production+and+%40status_approved+and+cdc_gas_pressure+%3E+99.9+and+cdc_gas_pressure+%3C+100.1+and+not+%40is_empty_target