Querying timely beliefs data from the database takes way longer than it should.
Digging in, we found that a subquery for selecting only the latest belief was missing some event_start filtering. This meant that the subquery was applied to all data, probably multiple times over, when it suffices to only check the same data subset that the main query is interested in.
We also found a problem which POstgres had with data types. We gave it a datetime row plus a Python timedelta on one side of an inequality. It seems this lead to problems in executing the query efficiently. Postgres or SQLAlchemy were basically struggling to execute it at all (not sure who exactly struggled). When we re-ordered the inequalities to move interpretation of Python datetime/timedelta calculations before SQLAlchemy's query interpretation, we saw speed improve again.
Next to improving the where criteria for the subquery, we also looked into adding another index, see #167.
The event start filtering can improve the query time by a factor of around 4 (if the sensor has sufficient data, that's the time-scaling factor). Solving the data type issue led to another huge improvement.
This is the query as it is happening before this PR:
SELECT timed_belief.event_start, timed_belief.belief_horizon, timed_belief.source_id, timed_belief.cumulative_probability, timed_belief.event_value
FROM timed_belief
JOIN data_source ON data_source.id = timed_belief.source_id
JOIN (SELECT timed_belief.event_start AS event_start, timed_belief.source_id AS source_id,
min(timed_belief.belief_horizon) AS most_recent_belief_horizon
FROM timed_belief
JOIN data_source ON data_source.id = timed_belief.source_id
WHERE timed_belief.sensor_id = :sensor_id_1 GROUP BY timed_belief.event_start, timed_belief.source_id
) AS anon_1
ON timed_belief.event_start = anon_1.event_start AND timed_belief.source_id = anon_1.source_id
AND timed_belief.belief_horizon = anon_1.most_recent_belief_horizon
WHERE timed_belief.sensor_id = :sensor_id_2 AND timed_belief.event_start + :event_start_1 > :param_1
AND timed_belief.event_start < :event_start_2
(the naming of the parameters is confusing - event_start_1 is an horizon, I believe, and param_1 a datetime)
The subquery anon_1 is not applying the event_start time window so we'll add timed_belief.event_start + :event_start_1 > :param_1 AND timed_belief.event_start < :event_start_2 to its where clause, as well.
Here is the code I used for timing (could be rewritten to only work in timely-beliefs) within flexmeasures shell:
Querying timely beliefs data from the database takes way longer than it should.
Digging in, we found that a subquery for selecting only the latest belief was missing some
event_start
filtering. This meant that the subquery was applied to all data, probably multiple times over, when it suffices to only check the same data subset that the main query is interested in.We also found a problem which POstgres had with data types. We gave it a datetime row plus a Python timedelta on one side of an inequality. It seems this lead to problems in executing the query efficiently. Postgres or SQLAlchemy were basically struggling to execute it at all (not sure who exactly struggled). When we re-ordered the inequalities to move interpretation of Python datetime/timedelta calculations before SQLAlchemy's query interpretation, we saw speed improve again.
Next to improving the where criteria for the subquery, we also looked into adding another index, see #167.
The event start filtering can improve the query time by a factor of around 4 (if the sensor has sufficient data, that's the time-scaling factor). Solving the data type issue led to another huge improvement.
This is the query as it is happening before this PR:
(the naming of the parameters is confusing -
event_start_1
is an horizon, I believe, andparam_1
a datetime)The subquery
anon_1
is not applying theevent_start
time window so we'll addtimed_belief.event_start + :event_start_1 > :param_1 AND timed_belief.event_start < :event_start_2
to its where clause, as well.Here is the code I used for timing (could be rewritten to only work in timely-beliefs) within
flexmeasures shell
: