adhearsion / adhearsion

A Ruby framework for building telephony applications
http://adhearsion.com
MIT License
609 stars 127 forks source link

Load based issue between Adhearsion 2.4 and 2.5? #526

Open runningferret opened 9 years ago

runningferret commented 9 years ago

We're experimenting with upgrading to the latest and greatest Adhearsion. Unfortunately, upon running a few load tests we get a few errors that smack of jruby/jruby#999 (we're on JRuby 1.7.4 still, though we've seen similar errors on the latest and greatest JRuby). We're seeing this error crop up under moderate to high call volume:

[2014-12-02 15:44:21.538] ERROR Adhearsion::Initializer: <NoMethodError> undefined method `find' for nil:NilClass
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/has-guarded-handlers-1.6.0/lib/has_guarded_handlers.rb:149:in `guarded?'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/has-guarded-handlers-1.6.0/lib/has_guarded_handlers.rb:92:in `trigger_handler'
    org/jruby/RubyKernel.java:1254:in `catch'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/has-guarded-handlers-1.6.0/lib/has_guarded_handlers.rb:91:in `trigger_handler'
    org/jruby/RubyEnumerable.java:556:in `find'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/has-guarded-handlers-1.6.0/lib/has_guarded_handlers.rb:89:in `trigger_handler'
    org/jruby/RubyKernel.java:1254:in `catch'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/has-guarded-handlers-1.6.0/lib/has_guarded_handlers.rb:88:in `trigger_handler'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/punchblock-2.5.3/lib/punchblock/translator/asterisk/call.rb:159:in `process_ami_event'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/punchblock-2.5.3/lib/punchblock/translator/asterisk.rb:209:in `ami_dispatch_to_or_create_call'
    org/jruby/RubyHash.java:1357:in `each_pair'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/punchblock-2.5.3/lib/punchblock/translator/asterisk.rb:207:in `ami_dispatch_to_or_create_call'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/punchblock-2.5.3/lib/punchblock/translator/asterisk.rb:90:in `handle_ami_event'
    org/jruby/RubyKernel.java:1932:in `public_send'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/calls.rb:25:in `dispatch'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/calls.rb:122:in `dispatch'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/actor.rb:322:in `handle_message'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/actor.rb:416:in `task'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/tasks.rb:55:in `initialize'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/tasks.rb:47:in `initialize'
    /srv/phone/apptastic/shared/bundle/jruby/1.9/gems/celluloid-0.15.2/lib/celluloid/tasks/task_fiber.rb:13:in `create'

I am working on the assumption that nobody is in fact passing in nil as an array of guards :) While it's entirely possible that this is entirely in has-guarded-handlers, as that bumped from 1.5 to 1.6 when we did the upgrade, we suspect it is the interaction between the handlers and Adhearsion/Punchlock.

We're able to somewhat reliably reproduce this just by running a SIPp test under somewhat heavy load, so I'm confident if we're able to ferret out a cause and fix I can confirm it is cleaned up.

Have you guys seen anything like this before, or have any ideas on where to dig in next? We've combed around in has-guarded-handlers but haven't seen anything that jumps out at us.

runningferret commented 9 years ago

cc @jared-prime @sfgeorge

bklang commented 9 years ago

I've not personally seen this.

If you can reliably reproduce this, it might be worth giving it a shot on CRuby. It won't prove anything conclusively, but it'll help make the case more convincing that the bug is in JRuby and not in has-guarded-handlers, or even Adhearsion.

ggayan commented 9 years ago

We have also seen this issue in our environment under some load, but It is really uncommon. We are using jruby 1.7.16.

We couldn't find anything either from our research. I would be glad to provide some more info if it is useful.

Jared-Prime commented 9 years ago

[EDIT]

Sorry, that was a link to an issue in a private repository. The issue, logged by @kares, which I think can be shared publicly, is:

possibly due lacking (GIL-free) performance testing ...there are parts in the AHN/PB actor stack that end up being a bottleneck. one such bottleneck is a singleton (there's only one in the system) actor - instance of Punchblock::Asterisk::Translator (it actually wraps the low-level IO based actor RubyAMIClient) NOTE: it seems not possible to simply refactor the translator to a (thread-safe) non-actor component, for whatever reason on the updated as well as current AHN/PB stack the system simply does not work, without printing any errors. It's maybe due the actor IO it talks to - have not investigated. more tries to improve translator - (out of non-invasive options) : increase actor's thread-priority, although we'll need to increase Fiber's thread priority as well ?! might turn out a hustle as fiber threads are re-used ?! change some calls (most are in Asterisk::Call) to not happen in an actor-way but be direct calls (e.g. register_call can be made thread-safe etc.)