Open runningferret opened 9 years ago
cc @jared-prime @sfgeorge
I've not personally seen this.
If you can reliably reproduce this, it might be worth giving it a shot on CRuby. It won't prove anything conclusively, but it'll help make the case more convincing that the bug is in JRuby and not in has-guarded-handlers, or even Adhearsion.
We have also seen this issue in our environment under some load, but It is really uncommon. We are using jruby 1.7.16.
We couldn't find anything either from our research. I would be glad to provide some more info if it is useful.
[EDIT]
Sorry, that was a link to an issue in a private repository. The issue, logged by @kares, which I think can be shared publicly, is:
possibly due lacking (GIL-free) performance testing ...there are parts in the AHN/PB actor stack that end up being a bottleneck. one such bottleneck is a singleton (there's only one in the system) actor - instance of Punchblock::Asterisk::Translator (it actually wraps the low-level IO based actor RubyAMIClient) NOTE: it seems not possible to simply refactor the translator to a (thread-safe) non-actor component, for whatever reason on the updated as well as current AHN/PB stack the system simply does not work, without printing any errors. It's maybe due the actor IO it talks to - have not investigated. more tries to improve translator - (out of non-invasive options) : increase actor's thread-priority, although we'll need to increase Fiber's thread priority as well ?! might turn out a hustle as fiber threads are re-used ?! change some calls (most are in Asterisk::Call) to not happen in an actor-way but be direct calls (e.g. register_call can be made thread-safe etc.)
We're experimenting with upgrading to the latest and greatest Adhearsion. Unfortunately, upon running a few load tests we get a few errors that smack of jruby/jruby#999 (we're on JRuby 1.7.4 still, though we've seen similar errors on the latest and greatest JRuby). We're seeing this error crop up under moderate to high call volume:
I am working on the assumption that nobody is in fact passing in
nil
as an array of guards :) While it's entirely possible that this is entirely inhas-guarded-handlers
, as that bumped from 1.5 to 1.6 when we did the upgrade, we suspect it is the interaction between the handlers and Adhearsion/Punchlock.We're able to somewhat reliably reproduce this just by running a SIPp test under somewhat heavy load, so I'm confident if we're able to ferret out a cause and fix I can confirm it is cleaned up.
Have you guys seen anything like this before, or have any ideas on where to dig in next? We've combed around in has-guarded-handlers but haven't seen anything that jumps out at us.