Open gillg opened 4 years ago
I'm not very familiar with Go, but I know prometheus. So here we have a simple an clear example of metrics implementation https://github.com/vmware-tanzu/velero/blob/main/pkg/metrics/metrics.go
Thanks for your proposal @gillg.
I'm reluctant to add another direct dependency, especially since Prometheus might not be everyone's choice.
You could potentially monitor the logs and trigger alerts based on log-entries with a non-null error property. All the metrics you listed are possible to discover via the logs, although we could think about changing how errors are logged to make the different errors easier identifiable.
If you're collecting the logs via CloudWatch (recommended), you can use the CloudWatch filter and pattern syntax to create alerts.
Hi !
I'm according with you about dependancies and why Prometheus is not the only choice. But Prometheus metrics format is very simple and complete, so today most of metrology systems works with Prometheus format. Moreover this case not needs third party tool to ship logs to cloudwatch or logstash, and analyze them to produce metrics. Parse log could be good to trigger events, but events and metrology have not the same goal.
In every cases, as you said a more identifiable log type can be useful as a pure metrics workaround.
Last question, Is it possible today to send logs to a specific file ?
You don't need a third-party tool to ship logs to CloudWatch. If you run this service as a Docker container on Amazon's Elastic Container Service, you can configure the awslogs driver and get all stdout/stderr output into CloudWatch automatically.
From CloudWatch, you could trigger an AWS Lambda function to parse and push these logs into other systems, e.g. to Elastic Cloud or an Amazon-hosted ElasticSearch instance - see e.g. this sample cloudwatch-logs-to-elastic-cloud function.
As you said for prometheus, everybody not use ECS and CloudWatch to store their logs. Moreover, a lambda is really a third party tool (even if it's managed code) to ship logs from cloudwatch to logstash or directly to ES (depending on use case). You need also a logs analysis sloution and ES could be very expensive solution for a high availability and if you need some retention. Without speak about durability...
As I said, prometheus metrics format is not a standard, but not so far... DataDog, Zabbix, Telegraph, Netdata, Prometheus itself and lot of opensource or vendor solutions supports it officialy as exposition format, or as input format, or as prometheus long time backend storage.
So, I understand if you want avoid non standrad dependancies and keep your tool very light and simple (I love it for this reason), but maybe a possible metrics exposition and a structured error log with well known codes could be very great ! You have the last word :grinning:
The Elastic Cloud solution was merely another example and it's also not really your use case, given that it's for log analysis and not metrics.
That's why I mentioned the CloutWatch filter and pattern syntax first, since it allows you to create alerts and metrics based on CloudWatch logs. They also have a list of examples of Creating Metric Filters for CloudWatch logs.
But yeah I understand that there are valid use cases where you might not want to use CloudWatch logs, e.g. when running aws-smtp-relay
outside of AWS.
I'm totally open to provide some way of plugging in an optional metrics collector, but ideally this can be done without having to compile in another dependency by default.
I know that Go has a plugin system, but it's only available on Linux and also creates shared library files, which would require deploying more than the main binary file.
Maybe the best way would be to provide an option at build time to compile with or without prometheus support.
What I will gladly do is help define the interface for a metrics collector, but the actual prometheus implementation probably has to live in a fork, as I don't think I'll be able to find the time to test and maintain it myself.
Instead of a specific vendor format / dep; output the metrics in the OpenTelemetry spec? Most major players in the Observability domain already support said format.
It could be usefull to add some prometheus metrics exposed on a specific HTTP(s) port. At least number of failed sends to make some alerts on them.
Metrics proposal
aws_smtp_relay_send_success_total{service="ses"} (service could be ses or pinpoint) aws_smtp_relay_send_error_total{service="ses"} aws_smtp_relay_client_auth_failed_total aws_smtp_relay_client_denied_ip_total aws_smtp_relay_client_denied_sender_total