Improve storage to support exclusive locks

Scukerman commented 6 years ago

As I can see, the redis storage implementation doesn't support any locks. There's no documentation on mechanism to how to do this right. The current Store interface doesn't provide any locks interface even for custom implementations.

So, I'd really like to see this change so one could implement their own locks (until someone makes some library).

For example, there's only one implementation of locks I've found (a simple one): https://github.com/bsm/redis-lock

rskumar commented 5 years ago

did you mean lock before reading/storing data to redis, so session update concurrrently do not corrupt?

Scukerman commented 5 years ago

yeah

alexedwards commented 5 years ago

I haven't been able to find much information on it, but according to this stack overflow answer Redis executes individual commands like SET and DEL atomically, which means that the session data stored in Redis shouldn't be corrupted even if clients are trying to make updates concurrently.

The only place this could be a problem is in the Commit() method, which executes both SET and PEXPIREAT commands. But they're already wrapped in a MULTI transaction which means that the two commands will be executed as a single atomic action without any other connections being able to execute anything between them (see here).

So, at the database level, I think everything is OK as-is and no additional locking is required to prevent data corruption.

Is that what you mean, or am I misunderstanding?

Scukerman commented 5 years ago

Imagine, you have two (replicas) microservices that use sessions to store data of user. Two concurrent requests may read the data from store at the same time and write it back as soon as change has done. Before you start reading the data, you have to lock it to prevent the reading and\or modification by other microservices. Once you read it, you'll be able to change the data and put it back. And only then you unlock the session.

So, every microservice should wait until the data is correct (unlocked) at the time of reading.

It's just like mutex implementation for redis (like maps in golang, remember?).

Workflow: Lock -> Read -> Change -> Write -> Unlock

rskumar commented 5 years ago

Workflow: Lock -> Read -> Change -> Write -> Unlock

This would make it slow. On every request you read session from store like Redis when you access its data, then wait for handler's work (1ms to 1 min or even more :) ), and finally write back any changes to store (Redis). If you read before write, but some other operation changed it meanwhile, what would you do? Do the apps developer has to write code for taking action in conflict case? It will become quite complex and unintuitive for the problem that doesn't happen ideally or we can avoid/ignore it.

That's why it's considered best practice to keep minimal data in session. You authenticate a user, keep its few info - email, shard, timestamp etc and for other data, consider using some DB.

What I follow

Minimize your session updating code, I limit it to login and logout.
Keep frequent updating data in other DB, with transaction and atomicity and make PKey as session ID or maybe use other indexed column to refer to session id.
And don't worry about update conflict, chances are very low a use with same session ID is hitting login twice. Even if he does, he will have two active session but cookie will point to latest one.
And, All microservices should not act as User facing API backend. Every microservice should not handle session, rather a gateway should delegate auth/authz to a specific microservice to load user's identifiers from session and propagate to next in context.

alexedwards commented 5 years ago

Workflow: Lock -> Read -> Change -> Write -> Unlock

The part of this that I think will be hard is guaranteeing the unlock. Off the top of my head:

The default LoadAndSave middleware only writes back to the database if the session data is modified, not with every HTTP request. So for some requests the database record will be locked, read, but there will never be the corresponding write and unlock.
If one of the microservices is sent a KILL signal while holding the database record lock. The database might do a cleanup and release the lock, or it might not, or it might only do so after a period of time. My guess is it will be different for different databases/stores.

An alternative approach to locking might be to implement some form of optimistic concurrency, a bit like an Etag in HTTP, and abort the 2nd update in the event of a conflict. But I think that would probably require breaking changes to scs, or at least a 2nd, alternative Store interface... and both of those things I would prefer to avoid!

Scukerman commented 5 years ago

@rskumar

Nevermind. It was a year ago and the bug (or feature) was present. Now I see that the codebase has changed. So I can't insist on it. Even though, you say that using one service for auth to communicate and store the data is truly the best practice (I believe you meant it) -- it's not. To distribute the load we have to use locks to prevent data corruption and data splitting. That's my opinion. I don't mind closing this issue since I wrote my own library to handle the issue.

But I would warn users that the library doesn't use distributed locks and therefore doesn't support safe concurrent access to data.

alexedwards commented 5 years ago

I'm not sure why I didn't think of this before, but if a distributed lock is desired it should be possible to write some custom middleware which creates a (time limited) lock using the session token from the incoming request, and defer's the unlock to run when the middleware returns.

The custom middleware can either wrap the regular scs.LoadAndSave middleware, or someone could even write their own version of the LoadAndSave middleware which includes the locking logic. See here for an example of a custom LoadAndSave implementation.

I think that that would probably be the best way to solve this, as the pattern is agnostic about the underlying session datastore (MySQL, Redis etc) and there doesn't need to be a write of the session data to the database to trigger the unlock.

alexedwards commented 5 years ago

I've updated the README to mention that SCS allows you to write custom middleware which controls a distributed lock. I think this is the right solution to the problem, and there aren't any changes I can see that are needed to the codebase to support it.

alexedwards / scs

Improve storage to support exclusive locks #45