GoogleCloudPlatform / spring-cloud-gcp

New home for Spring Cloud GCP development starting with version 2.0.
Apache License 2.0
414 stars 307 forks source link

Expose Micrometer metrics for the Logback LoggingAppender #2700

Open mzeijen opened 6 months ago

mzeijen commented 6 months ago

At my company we would like to monitor the successes and failures of writing log entries to GCP Cloud logging via the Logback LoggingAppender provided by Spring Cloud GCP. As far as I can see, currently no easy method is to achieve this monitoring. This means that won't be able create alerts with which we get notified that logs are not delivered to the cloud logging service, if this is failing for some reason. Of course console errors will probably be written but those are hard to monitor.

To be able to do this monitoring I would like a set of Micrometer counters that count the amount of successfully and failed write attempts to the GCP logging service. That way we can create alerts if the amount of failed write attempts start increasing across some threshold.

We are currently building our own solution to provide these metrics, but I believe that these metrics can be useful for others as well. It can not only be used to monitor if writing log entries succeeds or fails but also aid in troubleshooting. That is why I believe this would be a good feature to add to Spring Cloud GCP itself.

Our custom solution works by using a LoggingRpc wrapper that extends the write(WriteLogEntriesRequest request) method and adds a listener to the ApiFuture that is returned by the delegate write method. The listener will be called when the ApiFuture succeeds, fails or is canceled and that result is then counted in one of the appropriate counters.

We not count the amount of entry batches that are written, but also the amount of entries in those batches. The latter gives a more accurate indication of how many actual entries are send to the logging service and how many of those succeed and fail.

We only count the amount of entries that are in the WriteLogEntriesRequest requests, which in most cases is always one entry per request. We also wanted to count the amount of batches but that is not possible because batching of the requests happens deeper in the GRPC stack where we can't get to it, so we can't count it. However this is not a big deal, as long as we have something that we can count that indicates if writing log entries is working or not, which does work by counting the amount of entries in the write log entries request and if they where successfully written to the backend.

meltsufin commented 6 months ago

@blakeli0 FYI