SeitaBV / timely-beliefs

Model data as beliefs (at a certain time) about events (at a certain time).
MIT License
33 stars 6 forks source link

add index on timed_beliefs for faster search #167

Closed nhoening closed 6 months ago

nhoening commented 6 months ago

This will speed up search_session(), as that query and its subquery are looking for the same fields.

Note that I removed the usage of has_inherited_table(), which blocked the existing UniqueConstraint from being applied. Let me know if there was a strong argument for using it that I did not find. This function tests if one of the classes the model inherits from has a table assorted with it. I believe you might intend this as a protection of some sort? In FlexMeasures, we inherit from db.Model and from tb.TimedBeliefDBMixin, both of which have no table specified.

As to the unique constraint - it was never applied due to ,has_inherited_table() returning False. If we apply it, we don't allow beliefs for the same event (and from the same source...) with different probabilities (confirmed in one test failing and telling us that, as well). So I decided we don't need this constraint. The combined PK is the same but with the probability in it, so it seems to me we are fine.

nhoening commented 6 months ago

Now one test is failing, example below, seemingly because it enters a belief which violates the unique constraint. I'll take a look later.

FAILED timely_beliefs/tests/test_belief_query.py::test_select_most_recent_probabilistic_beliefs - sqlalchemy.exc.IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely)
(psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "timed_beliefs_quad_unique_and_search_idx"
DETAIL:  Key (event_start, belief_horizon, sensor_id, source_id)=(2025-01-02 22:45:00+00, 02:00:00, 1, 1) already exists.

[SQL: INSERT INTO timed_beliefs (event_start, belief_horizon, cumulative_probability, event_value, sensor_id, source_id) VALUES (%(event_start__0)s, %(belief_horizon__0)s, %(cumulative_probability__0)s, %(event_value__0)s, %(sensor_id__0)s, %(source_id__0) ... 4954 characters truncated ... on__37)s, %(cumulative_probability__37)s, %(event_value__37)s, %(sensor_id__37)s, %(source_id__37)s)]

Also a note from looking at results: we have 9867 warnings, some of which are DeprecationWarnings or FutureWarnings from Pandas, others are from us: UserWarning: <BeliefSource Source A> created from 'Source A'.

This seems useful: /home/runner/work/timely-beliefs/timely-beliefs/timely_beliefs/beliefs/classes.py:1086: PerformanceWarning: Adding/subtracting object-dtype array to DatetimeArray not vectorized.