single threaded server - Githubissues

GoogleCodeExporter commented 8 years ago

Hi, I am using Redis 2.2 on a Linux server which uses MapReduce. I am trying to 
run a program which has 9 billion requests to Redis. Of course, at the same 
time there are less than 500 Jedis client connecting to Redis simultaneously. 
However, I always get error of "connection timeout" in Jedis client. Although I 
can solve it by reconnect my clients to Redis until success, the program takes 
long time to run.

I notice that Redis is a single threaded server. Due to online document Redis 
can connect with multiple clients simultaneously. Does the event loop in Redis 
make it process request from all clients parallel or in fact, Redis has to wait 
for disconnect from a client so that it can connect and process other clients?

Thanks!

Original issue reported on code.google.com by yen...@gmail.com on 24 Aug 2011 at 1:58

GoogleCodeExporter commented 8 years ago

Redis is using an efficient event loop based on
select/epoll/kevent/... to deal with non blocking
network I/Os.

It can support many concurrent client connections.
However, it does not execute queries in parallel. So at
a given point in time, it only executes at most one query
(except for some specific commands).

In practice, this is not really an issue because most of the
queries are very fast with a complexity in O(1) or O(log n)
A unique Redis instance on a recent Intel CPU is able to
process about 150000 q/s.

Redis never has to wait for the disconnection of a client
before it can accept and process connections for other
clients.

Are your 500 connections attempts simultaneous?

Regards,
Didier.

Original comment by didier...@gmail.com on 24 Aug 2011 at 3:20

GoogleCodeExporter commented 8 years ago

Didier,

Thank you for your answer! It's clear and simple to understand how Redis runs. 
It is quite similar with my understanding. So, now I don't understand what 
caused the error. I don't think network environment is an issue for me since my 
machines are in the same data warehouse and connected to each other within a 
fast network.

Yep I have several hundred clients attempting to Redis simultaneously, the 
maximum # cannot exceeds 500 however it's not a # I can control. I am using 
Jedis as my client. Is the timeout error a Jedis issue?

Regards,
Yin

Original comment by yen...@gmail.com on 24 Aug 2011 at 3:46

GoogleCodeExporter commented 8 years ago

Also, in the case of new connections, typically operating systems will only 
allow the caller to have a backlog of pending incoming connections of a 
relatively small fixed number. In the case of linux, you can find out that 
number by "cat /proc/sys/net/core/somaxconn" on the command line. It is 128 on 
my linux machine. Given that you have 500 clients opening and closing 
connections, likely at a very high speed, I could see one of a few different 
scenarios leading to your issue:

1. you fill up the 128 entry queue because Redis is busy performing an 
operation (socket related, data related, etc.)
2. the system has difficulty handling the volume of incoming TCP requests and 
is overloaded
3. the system believes it is under attack from a SYN flood and purposefully 
delays packets from your requesting machine
4. you are moving enough data through Redis that your network is saturated, 
making it impossible to access
5. if you are using a VPS and it doesn't have enough processing power, IO, etc. 
allocated to it, the higher level system could be slowing it down enough to 
cause stutters enough to cause any/all of the earlier 4 conditions

I would recommend trying to use connection pooling in your mapreduce, and 
trying to run with fewer clients on a smaller dataset to try to find where your 
bottleneck is. Also, if you are just reading in Redis, and your data is small 
enough, you could run some identical copies of Redis on a few different 
machines, and have your clients randomly connect to one once you've determined 
how many one Redis can reasonably handle.

Original comment by josiah.c...@gmail.com on 24 Aug 2011 at 3:56

GoogleCodeExporter commented 8 years ago

Oh, on a related note, if you have one Redis that you are making all of your 
connections to, even assuming fairly reasonable moderate number of requests of 
100k (which is probably 2-4 times as much, really, because of the connection 
churn), you're still looking at 1 full day to run this computation.

Original comment by josiah.c...@gmail.com on 24 Aug 2011 at 3:59

GoogleCodeExporter commented 8 years ago

Hi,

Thank you for your suggestions. I will try to use connection pool. Actually 
this error does not occur if I run the program on a small data set. In my 
MapReduce job there is one Jedis instance per map. When I try to run on the 
whole large dataset, error usually came out after finishing 1000 map tasks 
(there were still much more maps to run for this job) and the error was 
repeated very often. It seems not like an issue relative with entry queue. How 
do you think of that?

Thank you!

Yin

Original comment by yen...@gmail.com on 24 Aug 2011 at 5:12

GoogleCodeExporter commented 8 years ago

Hi Josiah, can you tell me where is the related note?

Thanks!

Original comment by yen...@gmail.com on 24 Aug 2011 at 5:14

GoogleCodeExporter commented 8 years ago

Yin: Sorry, "on a related note" is slang for "a related topic to what is being 
discussed". So, the "related note" is actually what was written in my comment 
#4.

In situations where you have a high number of connections being 
created/destroyed, it could also be that Jedis isn't disconnecting, isn't 
disconnecting fast enough, Redis has hit it's own connection limit (what is 
your configuration set to?), etc.

While the large map operation is happening, how much and what kind of processor 
is being used on the machine hosting Redis?

Original comment by josiah.c...@gmail.com on 24 Aug 2011 at 6:52

GoogleCodeExporter commented 8 years ago

Hi Josiah,

Got it!

I explicitly set the connection limit for Redis to unlimited - and 
redis-bechmark show that Redis can handle more than 600 clients connecting at 
the same time under this setting. 

For the machine holding Redis instance, it has 8 CPUs with speed 2992 MHz for 
each. Hopefully this is helpful.

Thank you very much!

Regards,
Yin

Original comment by yen...@gmail.com on 25 Aug 2011 at 6:53

GoogleCodeExporter commented 8 years ago

What does top report for 'us', 'sy', 'id', 'wa', 'hi', and 'si' while the 
timeouts are happening?

I may be mistaken, but I believe that redis-benchmark keeps connections open 
after they are created and reuses them, and connects to the 'localhost', which 
removes some of the network overhead. Your map operations are coming from 
remote machines, correct? How fast is your network? Can you reduce your number 
of concurrent mappers?

Original comment by josiah.c...@gmail.com on 26 Aug 2011 at 6:27

GoogleCodeExporter commented 8 years ago

That the error starts occurring after about 1000 tasks is an indicator to me 
that you are hitting the file descriptor limit.

Please correct me if I'm wrong in the following assumptions:
- You use one connection per map task.
- You don't close the connection when you are done with the map task.

Most Linux distributions have a default per-process file descriptor limit of 
1024. The timeout errors you see can be caused by Jedis not being able to open 
a new socket, and throwing this as a timeout. You can execute "ulimit -n" to 
find out your fd limit, or add a numeric argument to set it (e.g. "ulimit -n 
4096"). Can you check if the error starts happening after ~4000 map tasks after 
executing the former command? If so, we have found the cause of this problem.

Instead of opening a connection for every map task, you will be better off 
using a connection pool where the map tasks allocate and put back connection 
objects. Next to not being subject to file descriptor limits, it will also be 
faster because you don't pay for connection setup/teardown for every map task.

Cheers,
Pieter

Original comment by pcnoordh...@gmail.com on 30 Aug 2011 at 3:24

GoogleCodeExporter commented 8 years ago

Original comment by anti...@gmail.com on 14 Sep 2011 at 3:36

Changed state: User-Wait

Lachim / redis

single threaded server #643