FusionAuth / fusionauth-issues

FusionAuth issue submission project
https://fusionauth.io
91 stars 12 forks source link

Add tracing telemetry to FusionAuth #1665

Open mooreds opened 2 years ago

mooreds commented 2 years ago

Add tracing telemetry to FusionAuth

Problem

I want to use tools like DataDog and Honeycomb to monitor my system, of which FusionAuth is a part.

Solution

Add in OpenTelemetry for FusionAuth operations. It's already documented how to integrate and capture default metrics: https://github.com/FusionAuth/fusionauth-site/pull/1293 but we should capture some spans around user logins.

Alternatives/workarounds

A clear and concise description of any alternative solutions or workarounds you've considered.

Additional context

If you have specific additional spans you'd like tracked, please comment.

https://github.com/open-telemetry/opentelemetry-java-instrumentation

https://opentelemetry.lightstep.com/spans/

https://geekflare.com/opentelemetry-introduction/

https://jeremymorrell.dev/blog/minimal-js-tracing/

Related

Community guidelines

All issues filed in this repository must abide by the FusionAuth community guidelines.

How to vote

Please give us a thumbs up or thumbs down as a reaction to help us prioritize this feature. Feel free to comment if you have a particular need or comment on how this feature should work.

theogravity commented 1 year ago

@mooreds The documentation doesn't mention how we'd do it for hosted instances. Can you give any pointers on how to do it for DataDog for a hosted instance?

mooreds commented 1 year ago

The solution that has worked for others is to use the prometheus endpoint:

https://fusionauth.io/docs/v1/tech/tutorials/prometheus

And ingest that into datadog.

Haven't done it myself, but this looks helpful: https://www.datadoghq.com/blog/monitor-prometheus-metrics/

That will give you low level metrics like jvm memory usage. If you want business level metrics (such as number of failed logins) you'll want to use webhooks and ingest those into datadog. No examples there that I can share.

theogravity commented 1 year ago

Thanks, this looks like it could work since we do deploy the datadog agent.

mooreds commented 1 year ago

@bhalsey here's the issue I mentioned.

bhalsey commented 1 year ago

FusionAuth has switched to a lighter weight HTTP server backend since the monitor guide was published. java-http does not have out of the box instrumentation from the opentelemetry-javaagent.jar, so we do not get traces of requests made to FusionAuth.

robotdan commented 4 months ago

@bhalsey can we handle https://github.com/FusionAuth/fusionauth-issues/issues/2741 as part of this work?

robotdan commented 4 months ago

FusionAuth has switched to a lighter weight HTTP server backend since the monitor guide was published. java-http does not have out of the box instrumentation from the opentelemetry-javaagent.jar, so we do not get traces of requests made to FusionAuth.

We publish a lot of metrics through the Prometheus endpoint around HTTP requests rates, errors, and timings. Is this what you're looking for, or is there something that we want that is not available via the Prometheus metrics?

mooreds commented 4 months ago

I believe this is a different kind of telemetry, which includes spans and traces information. https://opentelemetry.io/docs/concepts/signals/traces/

bhalsey commented 4 months ago

I believe this is a different kind of telemetry, which includes spans and traces information. https://opentelemetry.io/docs/concepts/signals/traces/

Correct. The image under https://opentelemetry.io/docs/concepts/observability-primer/#distributed-traces helps illustrate the value of traces. They can help identify the bottleneck in systems.

bhalsey commented 4 months ago

@bhalsey can we handle #2741 as part of this work?

2741 is scoped to usage of FusionAuth to help improve the product. This issue concerns OpenTelemetry tracing to help operators of FusionAuth identify bottlenecks and improve its performance.

dvictory commented 3 months ago

OTEL instrumentation support would be greatly appreciated, we need this too to trouble shoot performance

mooreds commented 3 months ago

@dvictory thanks for the comment! Please make sure to upvote the issue as well.

Comments are great for adding flavor or specific use cases, but we sort by number of upvotes to gauge community feature input.