Closed wb14123 closed 8 years ago
What happen is: new commands -> buffer in bufferWrite (state is waiting) connected (state is writing) send data10s (state is waiting) buffer more commands here buffer more commands here buffer more commands here ... ack 10s later (state is writing) send data100s (state is waiting)
buffer even more commands here ... the buffer is huge now ack 100s later (state is writing) send data1000s
that's a problem when you have lot's of commands to send (need back propagation, ...) but very efficient with small burst
Rediscala is currently 2 actors (the writer + the decoder) (the writer is the main actor, passing the read tcp packets to the decoder actor) I think it could be interesting to split the writer in 2. One actor buffering the commands (encoding) One actor managing the akka io actor (tcp write / read )
I hope we should get more perf by avoiding message from akka io actor to be mixed in the mailbox with all the redis commands
Sorry but I don't get why it takes so long (10s or 100s) to receive ack? I thought it would be fast.
It's akka io ACK (not tcp ACK) http://doc.akka.io/docs/akka/current/scala/io-tcp.html#Throttling_Reads_and_Writes (it's the time the OS take to copy it in the network kernel
I try to add some debug messages and found it is not the case: the actor receives the ack so fast that the buffer is barely used.
You can see the added debug messages in this commit. Then I write a simple program to send commands with loop: with 10k commands per iter, wait it complete then next iter. While I run the program, all I got is almost always "sent without buffer". And the perf I got is like what I got with redis-benchmark
with -P 1
.
try with 100k and 250k you will see the buffering
If you try to send 1000 k, it should explode because the buffering will eat all the memory / garbage collection. As your jvm can creater message faster than, what you can send through the localhost network
Tried 1000k, still most of them is sent without buffer.
You can see the test program here: https://github.com/wb14123/redis-benchmark
ah but you use a redispool (of 100 clients), can you try with just 1 redisclient
1 redisclient, 1000 000 commands (in 1 batch)
Still without buffer, and it is much slower since the process of the futures takes more time.
Hum weird,
Are you sure your the if
in the buffer-debug
doesn't hide your debug message?
Yes, with 100k, I've seen the buffered debug messages, though very little.
println the size of the buffer, you will see, it's growing
Oh, I've write the print message in write
in the wrong place, I'll fix it.
I've fixed it and it shows that most of the writes is buffered. I also print the avg buffer lenght, with 200k it's around 1200 and with 1000k it's around 3800, do you think this buffer size is enough?
Since each command is very small in my case, so I think it's about 1000 commands per batch. It's kind of enough. I can close this issue. But I still don't know why I cannot get the perf like redis-benchmark
. Maybe because the time to wait and process futures?
redis-benchmark is a bit different (look at the option to make it more similar), and there is no future management (future.sequence with 1Millions future is a killer)
I've changed my test code a little bit so that it doesn't need to manage so much futures while generating enough load (you can see my updated code in the repo I just mentioned), the code is like this:
def get(): Unit = {
val key = "some_key"
val result = redisClient.get(key)
result onSuccess { case _ => get() }
}
def benchmark() = {
(0 to 20000) foreach { _ => get() }
}
With this code I can get about 400k qps on my rmbp (4 cores). But it doesn't scale well: in a c4.8xlarge aws instance which has 36 cores, it still gets about 400k qps. And in c4.8xlarge, the buffer size is always 243. I may modify the code to set a minimal buffer size and test it again.
So what is the difference between redis-benchmark
and this code? I know -t
, -P
and -c
. I make -t
to get
and -c
to 1
which is the same as my test code, however I cannot control -P
with my test code.
It's very strange that it just cannot scale. I can set parallelism-factor = 0.3
and it can get the same performance with parallelism-factor = 1
. If I set parallelism-factor = 0.3
and start running two program, it can get about 1100k qps. Almost the ability of Redis.
Well the scale up is limited by the cpu core. But you can scale horizontally on multiple cores, with the redispool
A redisclient use 2 actors, so in a benchmark it's equal to 2 thread (+1 thread for the akka io actor)
I've used redispool to test. But it doesn't help a lot for multiple cores. I've tried to limit parallelism-max = 4
on a 36 cores machine, it performs a lot better than don't have this limit, though just use 4 threads (If I don't set limit, can get 400k qps, if set limit, can get 700k qps). If I start a second program with this config on the same machine, it can get more than 1300k qps, which is already the ability of Redis.
What I said above is the situation with minimal buffer size of 10000. If I don't modify it, it can get better performance with one instance of the program with parallelism-max = 4
(about 800k), but it just get 1200k qps with two instances. And the CPU shows more system cpu usage on Redis core compared to limite the buffer size.
Hi,
I've read the code and found
bufferWrite
inRedisWorkerIO
is only used to buffer the writes between the sent of the write and the ack of the write. Since the ack should be very quick (compared to the receive of data) , the buffer should be very small. So I'm thinking what about add a config to set a minimal buffer size? When there are many writes it should reduce the number of network packages and thus improve the performance.If you think this is a good idea, I can implement it and test if there are some improvement on performance.