csirtgadgets / massive-octo-spice

DEPRECATED - USE v3 (bearded-avenger)
https://github.com/csirtgadgets/bearded-avenger-deploymentkit/wiki
GNU Lesser General Public License v3.0
228 stars 62 forks source link

Problem with cif-worker #322

Closed perez1987 closed 8 years ago

perez1987 commented 8 years ago

I'm check the log CIF-worker and i see that continually get the following error:

[2015-08-12T13:02:32,172Z][ERROR]: zmq_send: Operation cannot be accomplished in current state at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 49. ZMQ::FFI::ErrorHelper::fatal('ZMQ::FFI::ErrorHelper=HASH(0x346e380)', 'zmq_send') called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 28 ZMQ::FFI::ErrorHelper::check_error('ZMQ::FFI::ErrorHelper=HASH(0x346e380)', 'zmq_send', -1) called at (eval 1184) line 17 ZMQ::FFI::ErrorHandler::check_error('ZMQ::FFI::ZMQ3::Socket=HASH(0x6052b28)', 'zmq_send', -1) called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ZMQ3/Socket.pm line 30 ZMQ::FFI::ZMQ3::Socket::send('ZMQ::FFI::ZMQ3::Socket=HASH(0x6052b28)', '{"Data":{"Observables":[{"protocol":-1,"confidence":"39.530",...') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 213 CIF::Worker::send('CIF::Worker=HASH(0x346e2d8)', 'ARRAY(0x6274450)') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 173 CIF::Worker::process('CIF::Worker=HASH(0x346e2d8)', '{\x{a} "id" : "3d65e5e1c4b3933db67efe49ca7a1862be734851ceb6de14...') called at /opt/cif/bin/cif-worker line 269 main::try {...} () called at /usr/share/perl5/Try/Tiny.pm line 81 eval {...} called at /usr/share/perl5/Try/Tiny.pm line 72 Try::Tiny::try('CODE(0x61c2240)', 'Try::Tiny::Catch=REF(0x62ee9b8)') called at /opt/cif/bin/cif-worker line 272 main::ANON('EV::IO=SCALAR(0x3450950)', 1) called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 eval {...} called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 AnyEvent::CondVar::Base::_wait('AnyEvent::CondVar=HASH(0x34504a0)') called at /usr/lib/perl5/AnyEvent.pm line 1995 AnyEvent::CondVar::Base::recv('AnyEvent::CondVar=HASH(0x34504a0)') called at /opt/cif/bin/cif-worker line 282 main::workers(5) called at /opt/cif/bin/cif-worker line 214 main::main() called at /opt/cif/bin/cif-worker line 183 main::ANON('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 269 Daemon::Control::_launch_program('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 223 Daemon::Control::_double_fork('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 422 Daemon::Control::do_start('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 616 Daemon::Control::run_command('Daemon::Control=HASH(0x5e439f8)', 'start') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 628 Daemon::Control::run('Daemon::Control=HASH(0x5e439f8)') called at /opt/cif/bin/cif-worker line 185 [2015-08-12T13:02:32,172Z][ERROR]: $VAR1 = '{ "id" : "3d65e5e1c4b3933db67efe49ca7a1862be734851ceb6de1405310d7d2dd393c3", "otype" : "fqdn", "tlp" : "green", ...............

Any idea of the source of the problem?

giovino commented 8 years ago

Q1: How Ram, CPU, and disk does your CIF box have?

Q2: How often do you see this error?

Q3: What's the average memory usage? free -m

Q4: Did you add a lot of feeds/data outside of what ships with CIF?

We are somewhat aware of this problem but it rarely surfaces itself on a machine with sufficient ram.

perez1987 commented 8 years ago

@giovino

Q1: 32Gb RAM 8 cores

Q2: About 1000 times a day

Q3: total used free shared buffers cached Mem: 32171 17789 14381 1 129 1540 -/+ buffers/cache: 16119 16051 Swap: 16376 0 16376

Q4: Yes, but the error comes mostly from these provider: hosts-file.net, threatexchange.fb.com, dshield.org, osint.bambenekconsulting.com and alexa.com

When fails the event is not inserted into database?

giovino commented 8 years ago

Thanks for the feedback, we'll queue this up to be thought about for RC2.

wesyoung commented 8 years ago

hrm, i wonder if you have almost too many cif-worker threads stomping on each-other...

wesyoung commented 8 years ago

do me a fav:

https://github.com/csirtgadgets/massive-octo-spice/blob/develop/src/bin/cif-worker#L32

which should be under /opt/cif/bin/cif-worker

and change that to 4, see if the error goes away (make sure cif-worker gets completely killed and restarted).

giovino commented 8 years ago

To test this, I think you have to update the cif-worker init.d script

  1. sudo vim /etc/init.d/cif-worker
    • change threads to 4
  2. sudo service cif-worker restart
giovino commented 8 years ago

I spent a fair amount of time trying to replicate this on a VM with 4 cores and 12GB of ram with no luck. 24 threads->nope, 1 thread->nope.

perez1987 commented 8 years ago

I will change the value of threat to 4 CIF-worker and I write again. Thank you @giovino @wesyoung

wesyoung commented 8 years ago

think we need to test this on something with 8 cores.

giovino commented 8 years ago

Issue https://github.com/csirtgadgets/massive-octo-spice/issues/320 has some error log data too.

wesyoung commented 8 years ago

this should be fixed in the develop branch, gonna test next 24-48 hrs somewhere to be sure...

perez1987 commented 8 years ago

@giovino @wesyoung

So ... what is the reason for the error?

wesyoung commented 8 years ago

there were a few things getting in the way of (what i think was) mis-proper signal handling. the gap here is, if you SIGINT the threads, they were supposed to finish what they were doing, and then shut down. but if they had a large [never ending] stream of things to do, they'd go on forever (and/or partially fail since some would catch the SIGINT but not all).

i changed it to "all respect the SIGINT no matter what", which now means they'll stop appropriately, but if you have a stream of data going through them, you might lose some. i think this is OK since cif-worker should rarely be restarted, but we'll have to see how it goes moving forward.

some of this also has to do with the way perl / zmq handles threading, and hopefully will get more refined as we move forward.

perez1987 commented 8 years ago

@wesyoung @giovino

After cif-worker change in the value of Threats to 4 I see the error zmq_send persists.

Additionally I also note that the memory have 88% of the 32 GB and 8 cores that have.

[2015-08-26T13:57:29,198Z][ERROR]: zmq_send: Operation cannot be accomplished in current state at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 49. ZMQ::FFI::ErrorHelper::fatal('ZMQ::FFI::ErrorHelper=HASH(0x266c478)', 'zmq_send') called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 28 ZMQ::FFI::ErrorHelper::check_error('ZMQ::FFI::ErrorHelper=HASH(0x266c478)', 'zmq_send', -1) called at (eval 1184) line 17 ZMQ::FFI::ErrorHandler::check_error('ZMQ::FFI::ZMQ3::Socket=HASH(0x51f93f8)', 'zmq_send', -1) called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ZMQ3/Socket.pm line 30 ZMQ::FFI::ZMQ3::Socket::send('ZMQ::FFI::ZMQ3::Socket=HASH(0x51f93f8)', '{"Token":"a0bf9964a8ed292e9f79c0ea150a78e9e3cb9854e6b97bc3e42...') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 213 CIF::Worker::send('CIF::Worker=HASH(0x266c3d0)', 'ARRAY(0x547d8a0)') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 173 CIF::Worker::process('CIF::Worker=HASH(0x266c3d0)', '{\x{a} "observable" : "475y02w5y8.racheimals.com",\x{a} "lasttime...') called at /opt/cif/bin/cif-worker line 269 main::try {...} () called at /usr/share/perl5/Try/Tiny.pm line 81 eval {...} called at /usr/share/perl5/Try/Tiny.pm line 72 Try::Tiny::try('CODE(0x54d4e88)', 'Try::Tiny::Catch=REF(0x55834f8)') called at /opt/cif/bin/cif-worker line 272 main::ANON('EV::IO=SCALAR(0x264e840)', 1) called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 eval {...} called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 AnyEvent::CondVar::Base::_wait('AnyEvent::CondVar=HASH(0x264e288)') called at /usr/lib/perl5/AnyEvent.pm line 1995 AnyEvent::CondVar::Base::recv('AnyEvent::CondVar=HASH(0x264e288)') called at /opt/cif/bin/cif-worker line 282 main::workers(3) called at /opt/cif/bin/cif-worker line 214 main::main() called at /opt/cif/bin/cif-worker line 183 main::ANON('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 269 Daemon::Control::_launch_program('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 223 Daemon::Control::_double_fork('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 422 Daemon::Control::do_start('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 616 Daemon::Control::run_command('Daemon::Control=HASH(0x50425c8)', 'start') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 628 Daemon::Control::run('Daemon::Control=HASH(0x50425c8)') called at /opt/cif/bin/cif-worker line 185

wesyoung commented 8 years ago

is this RC2?

https://groups.google.com/forum/#!topic/ci-framework/c9SaNiBnBQI

perez1987 commented 8 years ago

No, I modified only parameter Threats. I will schedule a deployment of RC2 and I'm communicating errors to go seeing .. Right @wesyoung ?

perez1987 commented 8 years ago

@giovino This test has not been included in the RC2, no?

https://github.com/csirtgadgets/massive-octo-spice/commit/67c248b080fa7e380c09557cb45b9e2b24a85553

I installed this version, and those changes are not applied

giovino commented 8 years ago

@perez1987

I believe this has been applied:

https://github.com/csirtgadgets/massive-octo-spice/commit/67c248b080fa7e380c09557cb45b9e2b24a85553

but if you are looking for the changes in init.d/cif-worker

it was pulled in this commit:

https://github.com/csirtgadgets/massive-octo-spice/commit/7ed108cc552e650ce59c7ace1e458fcf71bda8cf

-g