FlexMeasures / flexmeasures

The intelligent & developer-friendly EMS to support real-time energy flexibility apps, rapidly and scalable.
https://flexmeasures.io
Apache License 2.0
133 stars 34 forks source link

Support new single-belief fast track in timely-beliefs #1107

Open nhoening opened 2 weeks ago

nhoening commented 2 weeks ago

Description

Support most_recent_only, including a new index for it. This is most helpful for the status page, where we only want to look up the most recent event. This PR also stops a previous, very old, deprecation of the same name.

See https://github.com/SeitaBV/timely-beliefs/pull/179 for more info.

Look & Feel

If the status page lists a couple sensors, the speed difference should be palpable.

In tests, I saw search time drop from 300 milliseconds to below 30 milliseconds (only when adding the index, though)

How to test

Here is my test code in the FlexMeasures shell. Pick any sensor that has significant data.

First test with sensor 4, a sensor with 250K rows (not too many) and only one source, convenient to test with no other filters (like source).

± flexmeasures shell
>>> from flexmeasures.data.models.time_series import Sensor
>>> from datetime import datetime
>>> s4 = Sensor.query.get(4)
>>> # testing how it was before this PR when querying one row, this was ca. 300ms 
>>> time1 = datetime.now(); s4.search_beliefs(most_recent_events_only=True, most_recent_beliefs_only=True); print(f"Time: {datetime.now()-time1}") 
>>> # testing this PR's new fast track. 20-30 ms with the index (was up to 400 without).
>>> time1 = datetime.now(); s4.search_beliefs(most_recent_only=True, most_recent_beliefs_only=False); print(f"Time: {datetime.now()-time1}") 

Now I tested with a sensor which has millions of records (sensor 5), and to begin with we select the data source which is responsible for most of them (44, Seita scheduler):

>>> from flexmeasures.data.models.time_series import Sensor
>>> from flexmeasures.data.models.data_sources import DataSource
>>> from datetime import datetime
>>> s5 = Sensor.query.get(5)
>>> ds12 = DataSource.query.get(12)
>>> # Here, the two approaches are the same, as we tell the query which data source to use. Both need ca 250ms 
>>> time1 = datetime.now(); s5.search_beliefs(most_recent_events_only=True, most_recent_beliefs_only=True, source=ds12); print(f"Time: {datetime.now()-time1}")
>>> time1 = datetime.now(); s5.search_beliefs(most_recent_only=True, most_recent_beliefs_only=False, source=ds12); print(f"Time: {datetime.now()-time1}")
>>> # Now we leave out the data source, the first query is 1.5 seconds and returns 5 rows, one per data source
>>> time1 = datetime.now(); s5.search_beliefs(most_recent_events_only=True, most_recent_beliefs_only=True); print(f"Time: {datetime.now()-time1}")
>>> # The new-style query with the data source manages it in ca 265ms
>>> time1 = datetime.now(); s5.search_beliefs(most_recent_only=True, most_recent_beliefs_only=False, ds=ds12); print(f"Time: {datetime.now()-time1}")
>>> # Interesting: when we leave the data source out from the new-style query, we go down to ca 15ms(!!) as the index fits exactly. In this example, it also happens to be the same row
>>> time1 = datetime.now(); s5.search_beliefs(most_recent_only=True, most_recent_beliefs_only=False); print(f"Time: {datetime.now()-time1}")

I cannot explain all differences at the moment, but giving a source changes the setup. Adding a source to the index stops applying to the first case (sensor 4). I can't go further with this than where I am now.

nhoening commented 2 weeks ago

We have to wait for the new timely-beliefs version to make Github Actions work correctly on this.