armon / bloomd

C network daemon for bloom filters
http://armon.github.io/bloomd
Other
1.24k stars 112 forks source link

Connection drops when flush thread starts. #34

Closed sezginriggs closed 8 years ago

sezginriggs commented 8 years ago

Hello,

I'm trying to use Bloomd and bloom-python-driver with lots of sequential checks and also I'm adding keys when they are not existing. This process takes more than flush interval and when flush thread starts, it drops all connections to the Bloomd.

I guess this is a bug, or is it made like this on purpose? Do you have any recommendation?

Thank you very much for your help in advance!

armon commented 8 years ago

Do you have log output from the server or something? The flush thread is isolated from the connection handlers and shouldn't cause any connections to get dropped. I've never seen this happen before, definitely odd.

sezginriggs commented 8 years ago

Hello Armon,

I don't have any specific log output for it, both telnet connection and bloom-python-driver just disconnects when it flushes (and it just writes about flush interval to log). If you have any recommendation to diagnose problem, I can apply them and let you know about results...

armon commented 8 years ago

That is super odd... Can you collect the log output from the bloomd server when this is happening? Do you see it logging that its closing a bunch of connections. I'm honestly not sure how its even possible.

armon commented 8 years ago

@sezginriggs The only thing I can think of is that bloomd is actually crashing. That is the only way I can think that this would happen.

sezginriggs commented 8 years ago

@armon I checked again and you are probably right, it looks like crashing and starting again (as service). Do you have any idea about what can lead it to crash?

here is the log ===== BOF ======== Dec 31 03:29:37 rome systemd[1]: Failed to start Scalable Bloom Filter Daemon. Dec 31 03:29:37 rome systemd[1]: Unit bloomd.service entered failed state. Dec 31 03:29:37 rome systemd[1]: bloomd.service holdoff time over, scheduling restart. Dec 31 03:29:37 rome systemd[1]: Stopping Scalable Bloom Filter Daemon... Dec 31 03:29:37 rome systemd[1]: Starting Scalable Bloom Filter Daemon... Dec 31 03:29:37 rome bloomd[1248]: Without use_mmap, a crash of bloomd can result in data loss. Dec 31 03:29:37 rome bloomd[1248]: Starting bloomd. Dec 31 03:29:37 rome bloomd[1248]: Found 0 existing filters Dec 31 03:29:37 rome bloomd[1248]: Flush thread started. Interval: 300 seconds. Dec 31 03:29:37 rome bloomd[1248]: Cold unmap thread started. Interval: 3600 seconds. Dec 31 03:30:09 rome bloomd[1248]: Accepted client connection: 127.0.0.1 48689 [10] Dec 31 03:30:22 rome bloomd[1248]: Found 0 files for filter testfilter. Dec 31 03:30:22 rome bloomd[1248]: Found 0 files for filter testfilter. Dec 31 03:30:22 rome bloomd[1248]: Creating new file: /yedekdisk/bloomd/bloomd.testfilter/data.000.mmap for filter testfilter. Size: 300046 Dec 31 03:30:22 rome bloomd[1248]: Loaded SBF: testfilter. Num filters: 0. Dec 31 03:30:22 rome bloomd[1248]: Flushed filter 'testfilter'. Total time: 0 msec. Dec 31 03:30:22 rome bloomd[1248]: Applying delta update up to: 0 (vsn: 1) Dec 31 03:30:22 rome bloomd[1248]: Finished delta updates up to: 0 (vsn: 2) Dec 31 03:30:22 rome bloomd[1248]: Applying delta update up to: 2 (vsn: 2) Dec 31 03:30:23 rome bloomd[1248]: Finished delta updates up to: 2 (vsn: 3) Dec 31 03:30:23 rome bloomd[1248]: All updates applied. (vsn: 3) Dec 31 03:31:07 rome systemd[1]: bloomd.service start operation timed out. Terminating. Dec 31 03:31:07 rome bloomd[1248]: Received signal [Terminated]! Exiting... Dec 31 03:31:07 rome bloomd[1248]: Flushed filter 'testfilter'. Total time: 55 msec. Dec 31 03:31:08 rome systemd[1]: Failed to start Scalable Bloom Filter Daemon. Dec 31 03:31:08 rome systemd[1]: Unit bloomd.service entered failed state. Dec 31 03:31:08 rome systemd[1]: bloomd.service holdoff time over, scheduling restart. Dec 31 03:31:08 rome systemd[1]: Stopping Scalable Bloom Filter Daemon... Dec 31 03:31:08 rome systemd[1]: Starting Scalable Bloom Filter Daemon... Dec 31 03:31:08 rome bloomd[1260]: Without use_mmap, a crash of bloomd can result in data loss. Dec 31 03:31:08 rome bloomd[1260]: Starting bloomd. Dec 31 03:31:08 rome bloomd[1260]: Found 1 existing filters Dec 31 03:31:08 rome bloomd[1260]: Flush thread started. Interval: 300 seconds. Dec 31 03:31:08 rome bloomd[1260]: Cold unmap thread started. Interval: 3600 seconds. Dec 31 03:32:38 rome systemd[1]: bloomd.service start operation timed out. Terminating. Dec 31 03:32:38 rome bloomd[1260]: Received signal [Terminated]! Exiting... Dec 31 03:32:38 rome systemd[1]: Failed to start Scalable Bloom Filter Daemon. Dec 31 03:32:38 rome systemd[1]: Unit bloomd.service entered failed state. Dec 31 03:32:39 rome systemd[1]: bloomd.service holdoff time over, scheduling restart. Dec 31 03:32:39 rome systemd[1]: Stopping Scalable Bloom Filter Daemon... Dec 31 03:32:39 rome systemd[1]: Starting Scalable Bloom Filter Daemon... Dec 31 03:32:39 rome bloomd[1270]: Without use_mmap, a crash of bloomd can result in data loss. Dec 31 03:32:39 rome bloomd[1270]: Starting bloomd. Dec 31 03:32:39 rome bloomd[1270]: Found 1 existing filters Dec 31 03:32:39 rome bloomd[1270]: Flush thread started. Interval: 300 seconds. Dec 31 03:32:39 rome bloomd[1270]: Cold unmap thread started. Interval: 3600 seconds.

===== EOF ========

sezginriggs commented 8 years ago

I found that problem was occuring because of wrong configuration of systemd unit file. Corrected it and now everything seems working fine!

Thank you very much again!