Closed aryaniyaps closed 3 years ago
I have no hard numbers, but I boldly promise: "better than channels_redis".
At scale, you want to maximize:
A. Single-node message throughput -- number of messages routed through a single Channels client B. Cluster message throughput -- total number of messages sent through the entire system
When it comes to single-node throughput, there should be no contest: channels_redis
blocks every WebSocket connection for every .receive()
. channels_rabbitmq
does not. So if a message gets sent to 100 channels on a single server, channels_redis
will request 100 messages one-after-the-other, and channels_rabbitmq
will receive all 100 messages near-simultaneously.
I haven't measured how big a difference that makes. I'd expect channels_rabbitmq
to be maybe 10x faster, if network requests cost ~10x as much wall time as Channels' innards and your consumer.
But that 10x speedup is just single-node performance.
The real scaling problem in channels_redis
is actually B -- the whole cluster. group_send()
is ... well ... absurd. To group_send()
on Redis, the layer will:
By my count, channels_redis
group_send()
grows O(n^2) wrt number of connections if all connections join the same group:
This is my theorizing, anyway. I haven't tested.
I haven't tested, because why would anybody need to test? Redis is not a message broker. It's incorrect on a single node; why would anybody want to scale something faulty?
We at Workbench left Redis when we discovered that bug in step 1: if you don't group_expire
healthy WebSockets connections, the whole cluster stalls without notification; and if you do group_expire
healthy WebSockets connections, then you aren't doing your job.
I get riled up about this. Projects like channels_redis
are fundamentally flawed. Yet users and developers all double down and double down and double down, trying to accomplish the impossible. There's a perfectly free, handy, sound system out there in RabbitMQ and nobody loves me because I berate people for wasting years of their own time and other people's time instead of spending the minutes it would take to test out docker run rabbitmq:3
.
thanks a lot for your information! @adamhooper I have a question regarding my business model, though.
according to my business model,
there are objects called boxes. A lot of people can join boxes -100k or more- no hardcoded limit. Whenever a user puts a file in a box, then I need to send an FILE_CREATE
event to everyone else present in the box.
and many more events, like when users leave a box, or join it.
This is why I worry. Will channels_rabbitmq
be able to handle this?
If it cannot, then can you please suggest some other solutions?
To scale enormously, this layer only creates one RabbitMQ queue per instance. That means one web server gets one RabbitMQ queue, no matter how many websocket connections are open. For each message being sent, the client-side layer determines the RabbitMQ queue name and uses it as the routing key.
have you tested against rabbitmq to see how many concurrent connections one queue can handle? As per the quote above, in order to have more queues, we would need more webserver instances, am I right?
RabbitMQ is designed to handle huge loads -- tens of thousands of messages per second per CPU, easy. channels_rabbitmq
is designed to stay out of the way.
I have not benchmarked. In Workbench, with hundreds of concurrent connections, channels_rabbitmq
costs essentially zero overhead, so I haven't needed a round of optimizations. I look forward to someone benchmarking and suggesting optimizations based on that evidence.
If you're serious about handling 100k+ concurrent connections, abandon Python now. I cannot imagine a more expensive language for handling 100k concurrent connections.
@adamhooper I've settled with elixir for handling concurrent connections after learning it, thanks for your reply!
Probably a good idea.
For posterity: I suggest -- based on intuition, not evidence! -- stay away from Django if you want to serve 500-1,000 active connections per web server. Node, Go and Elixir should make it easier to code efficient software and harder to introduce monstrous bottlenecks.
How much can this project scale? I am curious because the redis channel layer provided by django is really slow when it comes to group sending, and I am looking for alternatives.
https://github.com/django/channels_redis/issues/83
thanks a lot!