We are using Opentsdb 2.4.1 and are hitting CallbackOverflowErrors and StackOverflowErrors in RegionClient.exceptionCaught. We noticed there recent patches available in master branch and are planning to cherry pick.
https://github.com/OpenTSDB/opentsdb/issues/839https://github.com/OpenTSDB/opentsdb/issues/334
But right now we are ignoring these errors in code as follows, to avoid cleaning up of inflight RPCs to the same region server handled by the RegionClient. Could you help commenting if this is a good stop gap solution? Do you see any side effects?
public void exceptionCaught(final ChannelHandlerContext ctx,
final ExceptionEvent event) {
final Throwable e = event.getCause();
final Channel c = event.getChannel();
if (e instanceof RejectedExecutionException) {
LOG.warn("RPC rejected by the executor,"
+ " ignore this if we're shutting down", e);
} else {
LOG.error("Unexpected exception from downstream on " + c, e);
}
updateExceptionCounter(e.getClass());
if(e.toString().contains("StackOverflowError") || e.toString().contains("CallbackOverflowError")){
LOG.error("StackOverflowError, CallbackOverflowError thus returning error without closing channel" + e);
return;
}
if (c.isOpen()) {
Channels.close(c); // Will trigger channelClosed(), which will cleanup()
} else { // else: presumably a connection timeout.
cleanup(c); // => need to cleanup() from here directly.
}
}
We are also planning to ignore ClosedChannelExceptions that we get in this method. From https://github.com/netty/netty/issues/724, learnt that Netty 4.0.0 no longer throws these exceptions in exceptionCaught, so we believe ignoring them would be equivalent, instead of an upgrade to Netty 4.0.0.
Any thoughts on this?
Thanks.
Hello,
We are using Opentsdb 2.4.1 and are hitting CallbackOverflowErrors and StackOverflowErrors in RegionClient.exceptionCaught. We noticed there recent patches available in master branch and are planning to cherry pick. https://github.com/OpenTSDB/opentsdb/issues/839 https://github.com/OpenTSDB/opentsdb/issues/334 But right now we are ignoring these errors in code as follows, to avoid cleaning up of inflight RPCs to the same region server handled by the RegionClient. Could you help commenting if this is a good stop gap solution? Do you see any side effects?
We are also planning to ignore ClosedChannelExceptions that we get in this method. From https://github.com/netty/netty/issues/724, learnt that Netty 4.0.0 no longer throws these exceptions in exceptionCaught, so we believe ignoring them would be equivalent, instead of an upgrade to Netty 4.0.0. Any thoughts on this? Thanks.