SeitaBV / timely-beliefs

Model data as beliefs (at a certain time) about events (at a certain time).
MIT License
34 stars 6 forks source link

Fast track strategy for searching single recent beliefs #179

Closed nhoening closed 3 months ago

nhoening commented 5 months ago

This PR adds a "ORDER_BY and LIMIT 1" strategy to search_session(), which drastically shortens the execution time for queries which really only want one (most recent) belief, e.g. to check connectivity status or show the latest measured value.

In this PR; we add a search filter most_recent_only (notice the missing "s"). This addition retains backwards compatibility, so packages using timely beliefs should have an easy time starting to use them.

Also, this PR adds an index to make sure the search time is constant, no matter how far in the past the most recent event might be. The index only adds fields used in the ORDER BY which is the recommended practice.

The psql session below demonstrates its effect in such a case (most recent event quite a while in the past - for sensors where the most recent even is really recent, execution time is fast anyway). I repeated each search twice to make sure we see a consistent behavior.

flexmeasures_prod=> \timing on
Timing is on.
flexmeasures_prod=> SELECT
timed_belief.event_start,
timed_belief.belief_horizon,
timed_belief.source_id,
timed_belief.cumulative_probability,
timed_belief.event_value
FROM
timed_belief
JOIN data_source ON data_source.id = timed_belief.source_id
WHERE
timed_belief.sensor_id = 4
AND timed_belief.event_start <= '2024-05-03 16:14:33.262614+09:00'
ORDER BY
timed_belief.event_start DESC
LIMIT
1;
      event_start       | belief_horizon | source_id | cumulative_probability |      event_value      
------------------------+----------------+-----------+------------------------+-----------------------
 2023-07-02 14:50:00+00 | 00:00:00       |         5 |                    0.5 | 0.0011505681818181816
(1 row)

Time: 550.283 ms
flexmeasures_prod=> SELECT
timed_belief.event_start,
timed_belief.belief_horizon,
timed_belief.source_id,
timed_belief.cumulative_probability,
timed_belief.event_value
FROM
timed_belief
JOIN data_source ON data_source.id = timed_belief.source_id
WHERE
timed_belief.sensor_id = 4
AND timed_belief.event_start <= '2024-05-03 16:14:33.262614+09:00'
ORDER BY
timed_belief.event_start DESC
LIMIT
1;
      event_start       | belief_horizon | source_id | cumulative_probability |      event_value      
------------------------+----------------+-----------+------------------------+-----------------------
 2023-07-02 14:50:00+00 | 00:00:00       |         5 |                    0.5 | 0.0011505681818181816
(1 row)

Time: 580.948 ms
flexmeasures_prod=> CREATE INDEX timed_belief_simplewhere_idx ON "timed_belief" (sensor_id, event_start);
CREATE INDEX
Time: 6020.611 ms (00:06.021)
flexmeasures_prod=> SELECT                                                                               
timed_belief.event_start,
timed_belief.belief_horizon,
timed_belief.source_id,
timed_belief.cumulative_probability,
timed_belief.event_value
FROM
timed_belief
JOIN data_source ON data_source.id = timed_belief.source_id
WHERE
timed_belief.sensor_id = 4
AND timed_belief.event_start <= '2024-05-03 16:14:33.262614+09:00'
ORDER BY
timed_belief.event_start DESC
LIMIT
1;
      event_start       | belief_horizon | source_id | cumulative_probability |      event_value      
------------------------+----------------+-----------+------------------------+-----------------------
 2023-07-02 14:50:00+00 | 00:00:00       |         5 |                    0.5 | 0.0011505681818181816
(1 row)

Time: 1.480 ms
flexmeasures_prod=> SELECT
timed_belief.event_start,
timed_belief.belief_horizon,
timed_belief.source_id,
timed_belief.cumulative_probability,
timed_belief.event_value
FROM
timed_belief
JOIN data_source ON data_source.id = timed_belief.source_id
WHERE
timed_belief.sensor_id = 4
AND timed_belief.event_start <= '2024-05-03 16:14:33.262614+09:00'
ORDER BY
timed_belief.event_start DESC
LIMIT
1;
      event_start       | belief_horizon | source_id | cumulative_probability |      event_value      
------------------------+----------------+-----------+------------------------+-----------------------
 2023-07-02 14:50:00+00 | 00:00:00       |         5 |                    0.5 | 0.0011505681818181816
(1 row)

Time: 1.047 ms

We also need to look into the space we are using with indices.

nhoening commented 4 months ago

@Flix6x I just saw that most_recent_only has been in use a while ago and is deprecated. Are you okay with resurrecting it? Or we use a slightly different name, like most_recent_single.

Flix6x commented 4 months ago

@Flix6x I just saw that most_recent_only has been in use a while ago and is deprecated. Are you okay with resurrecting it? Or we use a slightly different name, like most_recent_single.

Is it still mentioned in the code? If so, can you provide a link. I don't see it.

nhoening commented 4 months ago

I actually came across this place in FlexMeasures which still mentions it.

nhoening commented 4 months ago

(I'm working on support of this new feature in FlexMeasures in parallel)

Flix6x commented 4 months ago

In timely-beliefs, it could be that it wasn't even ever used in the first place, having preferred more explicit names.

I'm fine with repurposing the argument in FlexMeasures, and introducing it in timely-beliefs.

nhoening commented 3 months ago

Maybe testing this is best done with the supporting FlexMeasures PR? I can also imagine @victorgarcia98 taking this review up in your absence.