Open hx opened 2 years ago
I feel like an exception though is correct, you asked the method to do something and it was impossible for it to register the metric.
I also agree the exception you get back is not helpful at all, perhaps the first change here is to re-wrap the exception and raise something more helpful?
I also agree the exception you get back is not helpful at all, perhaps the first change here is to re-wrap the exception and raise something more helpful?
The real question here is context. The exception occurs in a server handler thread, in a call stack that doesn't allow gem consumers to implement an exception handler. The gem owns the server thread, so the gem should be handling the exception. Which means the error handling mechanism is a function of the desired behaviour.
So what's the desired behaviour? I don't think crashing the server thread would be helpful. We've found these errors only occur in production, so it doesn't fit the typical pattern of letting the unhandled exceptions fail a test suite. It's also a breaking change that I imagine few would appreciate.
I suggest delegating error handling to consumers. It's a little more flexible than letting them set a logger, because the default could maintain the existing behaviour. In the abstract:
class Server
def initialize
@on_error = -> e { STDERR.puts e.message }
end
def on_error(&block)
@on_error = block
end
end
…
server = Server.new(…)
server.on_error { |e| Rails.logger.warn "Prometheus exporter error: #{e.message}" }
Internally, this could be implemented with a simple generic exception class (e.g. PrometheusExporter::Server::Error
) that's rescued in the mount_proc
block and calls @on_error.call(e)
.
Let me know if you'd like to explore this approach, and I'll rewrite this PR. When we're happy with the design, I'll sort out whatever's going on with Ruby 3.
Thanks @SamSaffron !
I see, yes totally, happy with the change as long as you add some interface for the consumers to get at the exceptions. Please go ahead and explore the approach, also with the PR I would love it if you can update the readme to reflect the change!
We've found a lot of log noise comes from
PrometheusExporter::Server::Collector#process_hash
whenever#register_metric_unsafe
returnsnil
. We recognise there are two problems there;This PR suggests a remedy for the latter.
It also raises the broader question of how this scenario could be better handled, since we've somewhat clumsily managed to demonstrate that it can occur in production code. A quick search shows 5 usages of
STDERR.puts
, which would ideally be handleable somehow. Some ideas would be;mount_proc
block and send 500s or similar back. I'm not sure what the client repercussions would be there.I'm happy to discuss options and draft alternate PRs if this one isn't the way we want to go. Someone else in my organisation did our Prometheus implementations, so I'm pretty unfamiliar with its ecosystem and with this gem, and I'm happy to be corrected if I've missed anything.
Thanks everyone!