dburriss / event-feed

A HTTP based event feed for ASP.NET
MIT License
1 stars 1 forks source link

Middleware page caching #2

Open dburriss opened 2 years ago

dburriss commented 2 years ago

Summary

In order for the paging to be as performant as possible, we want to cache page results.

Initial design notes here.

Related concepts:

Motivation and goals

Since the return of events is based on paging, and the list of events is immutable, we can cache any completed page indefinitely since the contents of the page will never change. The messaging mechanism relies on polling for changes, so it is important that requests for pages, especially the latest page (tail), put as little strain on the server and database as possible.

Since caching is so important, relying on response caching can be unreliable. Not only does output caching allow clients to send a no-cache header, it is also inappropriate for the tail page as only the server knows when this will change. With only a few clients requesting every second for the last incomplete page, this could put a significant load on the server and database.

So a solution needs to consider the following goals:

In scope

Out of scope

Unknowns

dburriss commented 2 years ago

@Grepsy you expressed interest in picking up this issue? Take a read and let me know if still interested.

Grepsy commented 2 years ago

Yes, I'm still interested ;-)

Grepsy commented 2 years ago

When considering

The messaging mechanism relies on polling for changes, so it is important that requests for pages, especially the latest page (tail), put as little strain on the server and database as possible.

and

Worker for updating cache of tail page

It seems going for a background worker that frequently updates the tail page might not be beneficial for reducing database strain. The worker updating the cache every x seconds will put a constant load on the database. An on-demand cache (only responding when a request is sent) with a very short expiration will have no load in absence of requests, yet scale up to the same max. load as a background worker under high request load.

If we have a requirement for high-performance near real-time event publishing I think it makes more sense to expose a websocket endpoint where event are streamed instantly instead of polling. This is the same mechanism I often seen used on exchanges publishing real-time price and trade updates. Historical data would of course still be done using the paged API.

dburriss commented 2 years ago

It seems going for a background worker that frequently updates the tail page might not be beneficial for reducing database strain.

You raise a good point. The more I think about it, and consider your points here, fleshing out some pub/sub mechanism for when an event is added will give us some options here.

Some quick ideas:

So the above makes me think we can make some assumptions:

If we have a requirement for high-performance near real-time event publishing I think it makes more sense to expose a websocket endpoint where event are streamed instantly instead of polling.

High performance is relative but that isn't the explicit goal. There are plenty of tools that provide that, but are complex to setup and run effectively. Adding websocket may be a good idea but I would like to layer complexity on as needed. Starting with a REST based feed that provides a pull based mechanism needs to be at the core.