💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
The timestamps are being set by events.py when the events are created. There seem to be two losses at play here,
documentation of time.time() notes that "not all systems provide time with a better precision than 1 second". This is a little wild to me!
precision loss due to timestamp being a float type: Given that timestamp is created during event object creation. The case of (ev1.id > ev2.id) && (ev1.timestamp < ev2.timestamp) don't seem to be possible here. I also can't think of why sorting by timestamp would be good idea at all (possibility of collisions even in high-precision systems).
While the original query is quite complicated,
SELECT *
FROM events
WHERE sender_id = :sender_id
AND (
timestamp >= (
SELECT max(events.timestamp) AS session_start
FROM events
WHERE events.sender_id = :sender_id
AND events.type_name = :type_name
)
OR (
(
SELECT max(events.timestamp) AS session_start
FROM events
WHERE events.sender_id = :sender_id
AND events.type_name = :type_name
) IS NULL
)
)
ORDER BY timestamp, id;
I've some quick results on the performance of this query. Order by ID understandably is the best option for performance.
Order by timestamp
zi=# explain select * from events order by timestamp; QUERY PLAN
-------------------------------------------------------------------
Sort (cost=544.08..553.94 rows=3945 width=490)
Sort Key: "timestamp"
-> Seq Scan on events (cost=0.00..308.45 rows=3945 width=490)
(3 rows)
Order by id
----------------------------------------------------------------------------------
Index Scan using events_pkey on events (cost=0.29..543.61 rows=10088 width=177)
(1 row)
Order by timestamp and id
zi=# explain select * from events order by timestamp, id;
QUERY PLAN
--------------------------------------------------------------------
Sort (cost=1040.75..1065.97 rows=10088 width=177)
Sort Key: "timestamp", id
-> Seq Scan on events (cost=0.00..369.88 rows=10088 width=177)
(3 rows)
Status (please check what you already did):
[x] added some tests for the functionality
[ ] updated the documentation
[x] updated the changelog (please check changelog for instructions)
[x] reformat files using black (please check Readme for instructions)
Proposed changes:
While the original query is quite complicated,
I've some quick results on the performance of this query. Order by ID understandably is the best option for performance.
Order by
timestamp
Order by
id
Order by
timestamp
andid
Status (please check what you already did):
black
(please check Readme for instructions)