armon / bloomd

C network daemon for bloom filters
http://armon.github.io/bloomd
Other
1.24k stars 112 forks source link

Bloomd - Internal Error (too many open files) #15

Closed FGRibreau closed 11 years ago

FGRibreau commented 11 years ago

I just got an pybloomd.BloomdError: Got response: Internal Error (from filters[fname] = client.create_filter(fname) see code below).

I have 6 python bloomd clients that send a lot of requests to it. The data dir of bloomd currently contains 1023 bloom filters and counting.

My bloomd configuration is:

[bloomd]
tcp_port = 8673
udp_port = 8674
data_dir = /data/bloomd
log_level = INFO
cold_interval = 3600
flush_interval = 300
initial_capacity = 10001
default_probability = 0.0001
workers = 4
use_mmap = 1

Each client, does the following:

for each incoming message
 - filter = getBloomFilter(message.created_at)
 - if not filter.__contains__(message.id):
 -   filter.add(message.id)
 -   ... do other things ...

with the following python code:

client = BloomdClient(['localhost'])
filters = {}
serviceName = "hey"

def getBloomFilterName(created_at):
    now = datetime.utcfromtimestamp(created_at/1000.0)
    now = now.replace(microsecond=0, second=0, minute=0, hour=0)
    return "%s.%s-%s-%s" % (serviceName, now.year, now.month, now.day)

def getBloomFilter(created_at):
    fname = getBloomFilterName(created_at)

    if fname not in filters:
        filters[fname] = client.create_filter(fname)

    return filters[fname]

Multiple clients got the pybloomd.BloomdError: Got response: Internal Error error within ~50 seconds interval while calling client.create_filter(fname). Some of them were just under huge load so I'm sure they were creating a lot of bloom filters at the same time.

It's too bad that this error did not say what was the cause so I've got several questions and feedbacks:

[Update] In fact I've seen this error in the log:

Apr  8 09:24:33 serv1 bloomd[11870]: Failed to fault in the filter 'f1.2012-8-10'. Err: -1
Apr  8 09:24:45 serv1 bloomd[11870]: Failed to scan files for filter 'f2.2012-11-15'. Too many open files
Apr  8 09:24:59 serv1 bloomd[11870]: Failed to scan files for filter f1.2013-1-26'. Too many open files
Apr  8 09:24:59 serv1 bloomd[11870]: Failed to fault in the filter 'f1.2013-1-26'. Err: -1

After a restart:

cat /proc/sys/fs/file-nr
1728    0   6606290

So currently the only way to fix this is to restart bloomd, so I think there may be an issue (or a place for improvement) with the file handling in bloomd.

Again, thanks for your hard work!

FGRibreau commented 11 years ago

I was able to reproduce the error on my laptop with a lot of message that creates a lot of new filters so of course it reached the "too many open files" nearly instantly.

[bloomd]
tcp_port = 8673
udp_port = 8674
data_dir = /data/bloomd
log_level = INFO
cold_interval = 1800
flush_interval = 300
initial_capacity = 10001
default_probability = 0.0001
workers = 4
use_mmap = 1
sudo lsof -p `pidof bloomd` | grep "/data/" | wc -l
472
$ulimit -Hn                                                                                                              
unlimited
$ulimit -Sn
2560
Apr  8 15:57:07 MBP.local bloomd[77074] <Critical>: Failed to create new file: /data/bloomd/bloomd.f1.2012-10-2/data.000.mmap for filter f1.2012-10-2. Err: Too many open files
Apr  8 15:57:07 MBP.local bloomd[77074] <Error>: Failed to create SBF: f1.2012-10-2. Err: -24
Apr  8 15:57:07 MBP.local bloomd[77074] <Error>: Failed to fault in the filter 'f1.2012-10-2'. Err: -24

Do I have other choice than incrementing ulimit?

armon commented 11 years ago

The underlying issue here is that bloomd maintains open file handles for all the filters that are currently in memory. What is happening is that you are eventually just hitting the limit of file handles, so when you try to create a new filter, bloomd returns an internal error.

When bloomd is restarted, it starts without opening the on disk files, and only loads them when read, so it allows more filters to be made for a while.

The simplest solution is to just raise the ulimit values to something huge, that you are unlikely to ever hit. Otherwise, it would require a very large change to bloomd to actively manage the open file handles and work around the limits.

I hope that helps!

FGRibreau commented 11 years ago

Ok! I had the same conclusion as you do, thank you for your time!