fanout / django-eventstream

Server-Sent Events for Django
MIT License
664 stars 88 forks source link

Faster event log trimming #94

Open Schlepptop opened 2 years ago

Schlepptop commented 2 years ago

Fixes #93

jkarneges commented 2 years ago

Thanks! I think we still want the loop, since the query is limited to the batch size. And probably the .delete() goes after the slice?

Schlepptop commented 2 years ago

Good catch on the loop, I can add that again. But do we even want to limit it? Deletion really isn't that expensive of an operation, why do we need to restrict the batch size? Because querysets are lazy, it is evaluated only upon slicing so that's how we limit the clause if we do want to.

jkarneges commented 2 years ago

The batch limit is there so that all the results don't get buffered in memory. I seem to recall that even though querysets are evaluated lazily, iterating a queryset may still cause all the items in the entire set to end up in memory.

If calling delete() within a queryset doesn't cause all the items in the set to first be read into memory, and also if the deleting isn't atomic (i.e. doesn't block access to the table in between deletions of each item in the queryset), then the batching may be unnecessary.

Schlepptop commented 2 years ago

Ok then I'll just remove it all together. .delete() won't load any of the deleted object into memory. Also all items are deleted at once (it's one sql DELETE ... WHERE ... statement after all), so the database is only blocked during that one query.

jkarneges commented 2 years ago

We don't want to block the database/table though. Some searching around suggests that bulk deletes may perform a table lock, for example: https://stackoverflow.com/questions/11230225/how-to-efficiently-delete-rows-while-not-using-truncate-table-in-a-500-000-rows

Sorry to go back and forth on this but probably this should stay batched. That way a large trim only blocks the thread doing the trimming, while the table remains otherwise accessible and events can be added.