SuperCowPowers / workbench

Workbench: A scalable python framework for security research and development teams.
http://workbench.rtfd.org
MIT License
91 stars 19 forks source link

LostRemote: Lost remote after 10s heartbeat and various other challenges! #21

Closed Analect closed 10 years ago

Analect commented 10 years ago

@brifordwylie: I stumbled on your project when looking for some implementations of zerorpc and am very impressed with what you have assembled, even though I'm still scratching the surface in terms of understanding the machinery. I'm hoping you might be able to point me in the right direction with a few stumbling blocks I'm having in getting workbench fully functioning. I was initially trying to set all this up in a docker container, but ubuntu wasn't cooperating with the install of bro-IDS (where I was following instructions here: http://hackertarget.com/bro-ids-ubuntu/) on the basis that it couldn't find the right libmagic when running './configure --prefix=/opt/bro2'. It seems the 'brew install libmagic' on a mac handles all this much more gracefully. In terms of my installation on mac (10.9.2), I think it's all installed OK. I shifted back to libmagic 5.16 as per your readme. Here is output on starting the server: (env)Colums-MacBook-Pro:server me$ python -O workbench.py ZeroRPC tcp://0.0.0.0:4242 WorkBench DataStore connected: mongodb://localhost/workbench ELS Indexer connected: [{'host': 'localhost', 'port': 9200}] Neo4j GraphDB connected: http://localhost:7474/db/data < json_meta: loaded > < log_meta: loaded > < meta: loaded > < meta_deep: loaded > < pcap_bro: loaded > < pcap_meta: loaded > < pe_classifier: loaded > < pe_deep_sim: loaded > < pe_features: loaded > < pe_indicators: loaded > < pe_peid: loaded > < strings: loaded > < unzip: loaded > < urls: loaded > < view: loaded > < view_customer: loaded > < view_log_meta: loaded > < view_meta: loaded > < view_pcap_bro: loaded > < view_pcap_meta: loaded > < view_pdf: loaded > < view_pefile: loaded > < view_zip: loaded > ERROR:zerorpc.channel:zerorpc.ChannelMultiplexer, ignoring error on recv: invalid msg format "101": 'int' object is not iterable ERROR:zerorpc.channel:zerorpc.ChannelMultiplexer, ignoring error on recv: invalid msg format "77": 'int' object is not iterable

I'm not sure what those last two errors are ... but it might have been related to me trying to see if there as anything visible at localhost:4242 from a browser!

I was able to open the ipy notebooks and work through the demo one most of the way ... in terms of cells returning responses .. until I got to: "# We're going to load in all the files which include PE files, PCAPS, PDFs, and ZIPs and run 'view' on them. Note: This takes a while :)", which was when the various 'LostRemote' errors started happening. Back in the terminal (where the server was started), I see: "Returning cached work results for plugin: meta New work for plugin: view Too many open files (bundled/zeromq/src/signaler.cpp:388) Abort trap: 6"

These seem to be related to a problem with zerorpc and/or zeromq. I'm using zeroprc 0.4.4. I'm not sure how to check the version of zeromq I have installed!

LostRemote Traceback (most recent call last)

in () 6 with open(filename,'rb') as f: 7 md5 = c.store_sample(os.path.basename(filename), f.read(), tag_type(filename)) ----> 8 results.append(c.work_request('view', md5)) 9 pprint.pprint(results[:5]) /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/core.pyc in (_args, *_kargs) 258 259 def **getattr**(self, method): --> 260 return lambda _args, *_kargs: self(method, _args, *_kargs) 261 262 /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/core.pyc in **call**(self, method, _args, *_kargs) 243 try: 244 if kargs.get('async', False) is False: --> 245 return self._process_response(request_event, bufchan, timeout) 246 247 async_result = gevent.event.AsyncResult() /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/core.pyc in _process_response(self, request_event, bufchan, timeout) 215 def _process_response(self, request_event, bufchan, timeout): 216 try: --> 217 reply_event = bufchan.recv(timeout) 218 pattern = self._select_pattern(reply_event) 219 return pattern.process_answer(self._context, bufchan, request_event, /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/channel.pyc in recv(self, timeout) 265 266 try: --> 267 event = self._input_queue.get(timeout=timeout) 268 except gevent.queue.Empty: 269 raise TimeoutExpired(timeout) /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/gevent/queue.pyc in get(self, block, timeout) 198 if self.putters: 199 self._schedule_unlock() --> 200 result = waiter.get() 201 assert result is waiter, 'Invalid switch into Queue.get: %r' % (result, ) 202 return self._get() /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/gevent/hub.pyc in get(self) 566 self.greenlet = getcurrent() 567 try: --> 568 return self.hub.switch() 569 finally: 570 self.greenlet = None /Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/gevent/hub.pyc in switch(self) 329 if switch_out is not None: 330 switch_out() --> 331 return greenlet.switch(self) 332 333 def switch_out(self): LostRemote: Lost remote after 10s heartbeat I ran tests at workbench/server/workers as per the readme, which appeared to pass. <<< Note: Most of these tests require a local server running >>> ## ....................... Ran 23 tests in 1.529s OK However, for the workbench/client tests, the same heartbeat problems also seem to be root cause of failing tests. ## LostRemote: Lost remote after 10s heartbeat Ran 15 tests in 109.982s FAILED (errors=4) Sorry for rambling on .. but I would love to get this running properly. Where do I find the Neo4j indexing client ... to test the interface to Neo4j ... I had a look through client folder, but it wasn't obvious to me. Thanks, Colum
brifordwylie commented 10 years ago

Hi Colum,

I'm not sure but I think the root of the issue may be a maxfile limit. ... Too many open files (bundled/zeromq/src/signaler.cpp:388) ...

By default many machines have a very conservative max_file parameter. Because when you fling a lot of files at workbench it will open a bunch of zerorpc connections (the workers actually spin on connections as well).

in workbench/clients if you do $ ./runtests it barfs out something like this...

<<< Note: These tests may help (yes help) you hit the maxfile limit >>>

<<< Finding out now that you have a maxfile issue is good :) >>>

<<< We recommend setting softlimit on maxfiles to like 100k. >>>

<<< See this URL for infomation on how to increase maxfiles. >>>

<<< http://docs.basho.com/riak/latest/ops/tuning/open-files-limit >>>

Try following the instructions at http://docs.basho.com/riak/latest/ops/tuning/open-files-limit and see if that clears things up.

Also the code is super rough right now, so I'm going to do some cleanup today, also I have a new notebook that uses the Neo4j graphDB so I'll send you that as well.

Best regards,

-bri

On Thu, Apr 3, 2014 at 1:10 PM, Analect notifications@github.com wrote:

@brifordwylie https://github.com/brifordwylie: I stumbled on your project when looking for some implementations of zerorpc and am very impressed with what you have assembled, even though I'm still scratching the surface in terms of understanding the machinery. I'm hoping you might be able to point me in the right direction with a few stumbling blocks I'm having in getting workbench fully functioning. I was initially trying to set all this up in a docker container, but ubuntu wasn't cooperating with the install of bro-IDS (where I was following instructions here: http://hackertarget.com/bro-ids-ubuntu/) on the basis that it couldn't find the right libmagic when running './configure --prefix=/opt/bro2'. It seems the 'brew install libmagic' on a mac handles all this much more gracefully. In terms of my installation on mac (10.9.2), I think it's all installed OK. I shifted back to libmagic 5.16 as per your readme. Here is output on starting the server: (env)Colums-MacBook-Pro:server me$ python -O workbench.py ZeroRPC tcp://0.0.0.0:4242 WorkBench DataStore connected: mongodb://localhost/workbench ELS Indexer connected: [{'host': 'localhost', 'port': 9200}] Neo4j GraphDB connected: http://localhost:7474/db/data < json_meta: loaded > < log_meta: loaded > < meta: loaded > < meta_deep: loaded > < pcap_bro: loaded > < pcap_meta: loaded > < pe_classifier: loaded > < pe_deep_sim: loaded > < pe_features: loaded > < pe_indicators: loaded > < pe_peid: loaded > < strings: loaded > < unzip: loaded > < urls: loaded > < view: loaded > < view_customer: loaded > < view_log_meta: loaded > < view_meta: loaded > < view_pcap_bro: loaded > < view_pcap_meta: loaded > < view_pdf: loaded > < view_pefile: loaded > < view_zip: loaded > ERROR:zerorpc.channel:zerorpc.ChannelMultiplexer, ignoring error on recv: invalid msg format "101": 'int' object is not iterable ERROR:zerorpc.channel:zerorpc.ChannelMultiplexer, ignoring error on recv: invalid msg format "77": 'int' object is not iterable

I'm not sure what those last two errors are ... but it might have been related to me trying to see if there as anything visible at localhost:4242 from a browser!

I was able to open the ipy notebooks and work through the demo one most of the way ... in terms of cells returning responses .. until I got to: "# We're going to load in all the files which include PE files, PCAPS, PDFs, and ZIPs and run 'view' on them. Note: This takes a while :)", which was when the various 'LostRemote' errors started happening. Back in the terminal (where the server was started), I see:

"Returning cached work results for plugin: meta New work for plugin: view Too many open files (bundled/zeromq/src/signaler.cpp:388) Abort trap: 6"

These seem to be related to a problem with zerorpc and/or zeromq. I'm using zeroprc 0.4.4. I'm not sure how to check the version of zeromq I have installed!

LostRemote Traceback (most recent call last) in () 6 with open(filename,'rb') as f: 7 md5 = c.store_sample(os.path.basename(filename), f.read(), tag_type(filename)) ----> 8 results.append(c.work_request('view', md5)) 9 pprint.pprint(results[:5])

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/core.pyc in (_args, _kargs) 258 259 def getattr(self, method): --> 260 return lambda _args, _kargs: self(method, _args, *_kargs) 261 262

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/core.pyc in call(self, method, _args, *_kargs) 243 try: 244 if kargs.get('async', False) is False: --> 245 return self._process_response(request_event, bufchan, timeout) 246 247 async_result = gevent.event.AsyncResult()

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/core.pyc in _process_response(self, request_event, bufchan, timeout) 215 def _process_response(self, request_event, bufchan, timeout): 216 try: --> 217 reply_event = bufchan.recv(timeout) 218 pattern = self._select_pattern(reply_event) 219 return pattern.process_answer(self._context, bufchan, request_event,

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/zerorpc/channel.pyc in recv(self, timeout) 265 266 try: --> 267 event = self._input_queue.get(timeout=timeout) 268 except gevent.queue.Empty: 269 raise TimeoutExpired(timeout)

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/gevent/queue.pyc in get(self, block, timeout) 198 if self.putters: 199 self._schedule_unlock() --> 200 result = waiter.get() 201 assert result is waiter, 'Invalid switch into Queue.get: %r' % (result, ) 202 return self._get()

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/gevent/hub.pyc in get(self) 566 self.greenlet = getcurrent() 567 try: --> 568 return self.hub.switch() 569 finally: 570 self.greenlet = None

/Users/mccoole/Development/workbench/env/lib/python2.7/site-packages/gevent/hub.pyc in switch(self) 329 if switch_out is not None: 330 switch_out() --> 331 return greenlet.switch(self) 332 333 def switch_out(self):

LostRemote: Lost remote after 10s heartbeat

I ran tests at workbench/server/workers as per the readme, which appeared to pass. <<< Note: Most of these tests require a local server running >>> .......................

Ran 23 tests in 1.529s OK

However, for the workbench/client tests, the same heartbeat problems also seem to be root cause of failing tests. LostRemote: Lost remote after 10s heartbeat

Ran 15 tests in 109.982s FAILED (errors=4)

Sorry for rambling on .. but I would love to get this running properly.

Where do I find the Neo4j indexing client ... to test the interface to Neo4j ... I had a look through client folder, but it wasn't obvious to me.

Thanks, Colum

Reply to this email directly or view it on GitHubhttps://github.com/SuperCowPowers/workbench/issues/21 .

Analect commented 10 years ago

Thanks bri. I'll read up and tweak these and see if they help and let you know. Yes please, I'd love to see the notebook that interfaces back to Neo4j. In your work in this space, have you seen anything where a message could come from a kafka bus, get picked up by zeromq and then distributed to a job and execution status passed back to the kafka bus?

brifordwylie commented 10 years ago

Sorry haven't used Kafka, so I'm not sure about any integration with zeromq. I've put up a new notebook that shows how to use workbench with Neo4j. http://nbviewer.ipython.org/github/SuperCowPowers/workbench/blob/master/notebooks/PE_SimGraph.ipynb

brifordwylie commented 10 years ago

Hey Analect, going to close this issue for now as my assumption is that it's a maxfile issue. Happy to reopen it if you have any additional problem, just let me know.