It's important for us to be able to detect situations where a network thread spends too long doing non-network things. Today we log some warnings in this area but they're not 100% useful (e.g. the OutboundHandler warnings include the time spent doing other things while the outbound channel is unwritable). Making this stuff more granular is hard, especially if we don't want to disturb the performance of these performance-critical threads.
Rather than pushing more timing and logging work onto these threads, it seems like a better approach would be to build a separate watchdog mechanism that runs occasionally (say, every 15s) and ensures that every network thread is either idle or completed at least one task since the last time the watchdog ran. Built right, I reckon we could make each thread report its progress by simply adjusting a volatile long field (maybe reserving one bit as an idle flag) which seems like it should be adequately performant.
It's important for us to be able to detect situations where a network thread spends too long doing non-network things. Today we log some warnings in this area but they're not 100% useful (e.g. the
OutboundHandler
warnings include the time spent doing other things while the outbound channel is unwritable). Making this stuff more granular is hard, especially if we don't want to disturb the performance of these performance-critical threads.Rather than pushing more timing and logging work onto these threads, it seems like a better approach would be to build a separate watchdog mechanism that runs occasionally (say, every 15s) and ensures that every network thread is either idle or completed at least one task since the last time the watchdog ran. Built right, I reckon we could make each thread report its progress by simply adjusting a
volatile long
field (maybe reserving one bit as an idle flag) which seems like it should be adequately performant.