Closed perez1987 closed 8 years ago
Q1: How Ram, CPU, and disk does your CIF box have?
Q2: How often do you see this error?
Q3: What's the average memory usage? free -m
Q4: Did you add a lot of feeds/data outside of what ships with CIF?
We are somewhat aware of this problem but it rarely surfaces itself on a machine with sufficient ram.
@giovino
Q1: 32Gb RAM 8 cores
Q2: About 1000 times a day
Q3: total used free shared buffers cached Mem: 32171 17789 14381 1 129 1540 -/+ buffers/cache: 16119 16051 Swap: 16376 0 16376
Q4: Yes, but the error comes mostly from these provider: hosts-file.net, threatexchange.fb.com, dshield.org, osint.bambenekconsulting.com and alexa.com
When fails the event is not inserted into database?
Thanks for the feedback, we'll queue this up to be thought about for RC2.
hrm, i wonder if you have almost too many cif-worker threads stomping on each-other...
do me a fav:
https://github.com/csirtgadgets/massive-octo-spice/blob/develop/src/bin/cif-worker#L32
which should be under /opt/cif/bin/cif-worker
and change that to 4, see if the error goes away (make sure cif-worker gets completely killed and restarted).
To test this, I think you have to update the cif-worker init.d script
I spent a fair amount of time trying to replicate this on a VM with 4 cores and 12GB of ram with no luck. 24 threads->nope, 1 thread->nope.
I will change the value of threat to 4 CIF-worker and I write again. Thank you @giovino @wesyoung
think we need to test this on something with 8 cores.
Issue https://github.com/csirtgadgets/massive-octo-spice/issues/320 has some error log data too.
this should be fixed in the develop branch, gonna test next 24-48 hrs somewhere to be sure...
@giovino @wesyoung
So ... what is the reason for the error?
there were a few things getting in the way of (what i think was) mis-proper signal handling. the gap here is, if you SIGINT the threads, they were supposed to finish what they were doing, and then shut down. but if they had a large [never ending] stream of things to do, they'd go on forever (and/or partially fail since some would catch the SIGINT but not all).
i changed it to "all respect the SIGINT no matter what", which now means they'll stop appropriately, but if you have a stream of data going through them, you might lose some. i think this is OK since cif-worker should rarely be restarted, but we'll have to see how it goes moving forward.
some of this also has to do with the way perl / zmq handles threading, and hopefully will get more refined as we move forward.
@wesyoung @giovino
After cif-worker change in the value of Threats to 4 I see the error zmq_send persists.
Additionally I also note that the memory have 88% of the 32 GB and 8 cores that have.
[2015-08-26T13:57:29,198Z][ERROR]: zmq_send: Operation cannot be accomplished in current state at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 49. ZMQ::FFI::ErrorHelper::fatal('ZMQ::FFI::ErrorHelper=HASH(0x266c478)', 'zmq_send') called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 28 ZMQ::FFI::ErrorHelper::check_error('ZMQ::FFI::ErrorHelper=HASH(0x266c478)', 'zmq_send', -1) called at (eval 1184) line 17 ZMQ::FFI::ErrorHandler::check_error('ZMQ::FFI::ZMQ3::Socket=HASH(0x51f93f8)', 'zmq_send', -1) called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ZMQ3/Socket.pm line 30 ZMQ::FFI::ZMQ3::Socket::send('ZMQ::FFI::ZMQ3::Socket=HASH(0x51f93f8)', '{"Token":"a0bf9964a8ed292e9f79c0ea150a78e9e3cb9854e6b97bc3e42...') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 213 CIF::Worker::send('CIF::Worker=HASH(0x266c3d0)', 'ARRAY(0x547d8a0)') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 173 CIF::Worker::process('CIF::Worker=HASH(0x266c3d0)', '{\x{a} "observable" : "475y02w5y8.racheimals.com",\x{a} "lasttime...') called at /opt/cif/bin/cif-worker line 269 main::try {...} () called at /usr/share/perl5/Try/Tiny.pm line 81 eval {...} called at /usr/share/perl5/Try/Tiny.pm line 72 Try::Tiny::try('CODE(0x54d4e88)', 'Try::Tiny::Catch=REF(0x55834f8)') called at /opt/cif/bin/cif-worker line 272 main::ANON('EV::IO=SCALAR(0x264e840)', 1) called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 eval {...} called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 AnyEvent::CondVar::Base::_wait('AnyEvent::CondVar=HASH(0x264e288)') called at /usr/lib/perl5/AnyEvent.pm line 1995 AnyEvent::CondVar::Base::recv('AnyEvent::CondVar=HASH(0x264e288)') called at /opt/cif/bin/cif-worker line 282 main::workers(3) called at /opt/cif/bin/cif-worker line 214 main::main() called at /opt/cif/bin/cif-worker line 183 main::ANON('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 269 Daemon::Control::_launch_program('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 223 Daemon::Control::_double_fork('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 422 Daemon::Control::do_start('Daemon::Control=HASH(0x50425c8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 616 Daemon::Control::run_command('Daemon::Control=HASH(0x50425c8)', 'start') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 628 Daemon::Control::run('Daemon::Control=HASH(0x50425c8)') called at /opt/cif/bin/cif-worker line 185
No, I modified only parameter Threats. I will schedule a deployment of RC2 and I'm communicating errors to go seeing .. Right @wesyoung ?
@giovino This test has not been included in the RC2, no?
https://github.com/csirtgadgets/massive-octo-spice/commit/67c248b080fa7e380c09557cb45b9e2b24a85553
I installed this version, and those changes are not applied
@perez1987
I believe this has been applied:
https://github.com/csirtgadgets/massive-octo-spice/commit/67c248b080fa7e380c09557cb45b9e2b24a85553
but if you are looking for the changes in init.d/cif-worker
it was pulled in this commit:
https://github.com/csirtgadgets/massive-octo-spice/commit/7ed108cc552e650ce59c7ace1e458fcf71bda8cf
-g
I'm check the log CIF-worker and i see that continually get the following error:
[2015-08-12T13:02:32,172Z][ERROR]: zmq_send: Operation cannot be accomplished in current state at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 49. ZMQ::FFI::ErrorHelper::fatal('ZMQ::FFI::ErrorHelper=HASH(0x346e380)', 'zmq_send') called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ErrorHelper.pm line 28 ZMQ::FFI::ErrorHelper::check_error('ZMQ::FFI::ErrorHelper=HASH(0x346e380)', 'zmq_send', -1) called at (eval 1184) line 17 ZMQ::FFI::ErrorHandler::check_error('ZMQ::FFI::ZMQ3::Socket=HASH(0x6052b28)', 'zmq_send', -1) called at /usr/local/share/perl/5.18.2/ZMQ/FFI/ZMQ3/Socket.pm line 30 ZMQ::FFI::ZMQ3::Socket::send('ZMQ::FFI::ZMQ3::Socket=HASH(0x6052b28)', '{"Data":{"Observables":[{"protocol":-1,"confidence":"39.530",...') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 213 CIF::Worker::send('CIF::Worker=HASH(0x346e2d8)', 'ARRAY(0x6274450)') called at /opt/cif/bin/../lib/perl5/CIF/Worker.pm line 173 CIF::Worker::process('CIF::Worker=HASH(0x346e2d8)', '{\x{a} "id" : "3d65e5e1c4b3933db67efe49ca7a1862be734851ceb6de14...') called at /opt/cif/bin/cif-worker line 269 main::try {...} () called at /usr/share/perl5/Try/Tiny.pm line 81 eval {...} called at /usr/share/perl5/Try/Tiny.pm line 72 Try::Tiny::try('CODE(0x61c2240)', 'Try::Tiny::Catch=REF(0x62ee9b8)') called at /opt/cif/bin/cif-worker line 272 main::ANON('EV::IO=SCALAR(0x3450950)', 1) called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 eval {...} called at /usr/lib/perl5/AnyEvent/Impl/EV.pm line 88 AnyEvent::CondVar::Base::_wait('AnyEvent::CondVar=HASH(0x34504a0)') called at /usr/lib/perl5/AnyEvent.pm line 1995 AnyEvent::CondVar::Base::recv('AnyEvent::CondVar=HASH(0x34504a0)') called at /opt/cif/bin/cif-worker line 282 main::workers(5) called at /opt/cif/bin/cif-worker line 214 main::main() called at /opt/cif/bin/cif-worker line 183 main::ANON('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 269 Daemon::Control::_launch_program('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 223 Daemon::Control::_double_fork('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 422 Daemon::Control::do_start('Daemon::Control=HASH(0x5e439f8)') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 616 Daemon::Control::run_command('Daemon::Control=HASH(0x5e439f8)', 'start') called at /usr/local/share/perl/5.18.2/Daemon/Control.pm line 628 Daemon::Control::run('Daemon::Control=HASH(0x5e439f8)') called at /opt/cif/bin/cif-worker line 185 [2015-08-12T13:02:32,172Z][ERROR]: $VAR1 = '{ "id" : "3d65e5e1c4b3933db67efe49ca7a1862be734851ceb6de1405310d7d2dd393c3", "otype" : "fqdn", "tlp" : "green", ...............
Any idea of the source of the problem?