Closed yuvipanda closed 6 years ago
Brainstorming this a bit more, it looks like the data structure we need to maintain looks like a min heap of sets. Each set represents a shard, and we can always add new users to the top set (since it has the smallest number of users currently). To find where a user is, we would have to do a O(n) operation, checking for set containment across all the objects. However, n is usually small (maxing out at a few hundreds?) and I think the simplicity of this design makes it more worthwhile.
I strongly suspect that I can turn this into a simple RDBMS design that I can then implement on postgresql.
A single table design might be enough, with the following columns: id, name, shard.
To add a new item, we simply do something along the lines of:
INSERT INTO items(name, shard) VALUES(:name, SELECT shard FROM items GROUP BY shard ORDER BY count(shard) LIMIT 1)
We can then read the value of the shard. This also lets us use the transactional capabilities of postgres (or sqlite or anything) to prevent races. I believe we can index this properly so this is quite fast.
Finding out which shard an item belongs to is also a trivial SQL query now.
This is much simpler than my previous comment about datastructures :D
Also note that with this we no longer set 'capacity' for a shard. All shards are counted to be unlimited capacity, but we will always only assign new users to the shard with the smallest number of items - thus equalizing the number of users per shard over time. This also means that if we run out of capacity, users might get assigned to shards that are already 'full' - which might be quite bad if not planned for properly.
In our config, we'll hook into this from the spawner to figure out where to place the user.
Also this can totally be a python library rather than a service! Since all the locking is done at the db level, that'd be totally fine...
wheee, I've a sharder object now!
We will have N NFS servers, each of which can serve a capacity Cn users. We need a high performance, non-racy and simple way of doing the following:
It also needs to perform at least some sort of authentication, split for reads and writes.
All this also needs to be fairly race free, and we should have a plan for backing up the data + reconstructing it from scratch if we lose it.