HangfireIO / Hangfire

An easy way to perform background job processing in .NET and .NET Core applications. No Windows Service or separate process required
https://www.hangfire.io
Other
9.43k stars 1.7k forks source link

Add ActivitySource to start producer and consumer activities (spans) to support OpenTelemetry observability #2460

Open sgryphon opened 3 weeks ago

sgryphon commented 3 weeks ago

OpenTelemetry is a cross-platform open-source standard for distributed tracing, which allows you to collect and analyze data about the performance of your systems.

OpenTelemetry is now the default for new .NET applications: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/observability-with-otel

These changes add a Hangfire ActivitySource to start a Producer activity (span) when a job (either background or recurring) is created, and then a Consumer activity when it is processed.

It addresses part of the integration to Aspire, requested in https://github.com/HangfireIO/Hangfire/issues/2408

It also provides a built in core solution, compared to using filters discussed in https://github.com/HangfireIO/Hangfire/issues/2017

The creation context information is persisted with the job, so that the distributed activities can be correlated, even if processed on a different server.

No library user involvement or changes are needed except to register a listener for the ActivitySource, such as via OpenTelemetry. The Producer activity will pick up any existing context (such as from an ASP.NET request), or create a new one as needed.

The NetCore example has been updated to show how TraceId is correlated between job creation and execution. The PR also includes unit tests and a brief mention in the Readme.

Note: Some of the work has been based on the MassTransit implementation of ActivitySource.

sgryphon commented 2 weeks ago

@odinserj should I target this at the Main or Dev branch? Thanks.

odinserj commented 6 days ago

Hi @sgryphon! I like the idea, but the number of changes is really high. How does it compare with the following gist that's implemented as a job filter? https://gist.github.com/odinserj/06e18950b5bf3083a5aed0ed06d3d18a

sgryphon commented 6 days ago

How does it compare with the following gist that's implemented as a job filter?

Thanks for reviewing.

I am currently use a similar (but limited) filter, as discussed in https://github.com/HangfireIO/Hangfire/issues/2017

I do think that activity tracing, like metrics, is better as part of the core solution, turned on by default / with a standard name. I based some of that on MassTransit, where you just subscribe to a known event source; the .NET elements (e.g. HttpClient) are similar in that you can just easily enable them.

Decoupling from the flow is however a good idea, so happy with a different approach. My main thoughts are:

  1. Can the filter be a default system filter, i.e. turn on by default (with a fixed Activity Source name).
  2. Does it support both background jobs and recurring jobs.

Apart from that, some specific feedback on the gist:

If you can answer the first two items, I'd be happy to work on a revised solution using the filter.

sgryphon commented 5 days ago
  1. Can the filter be a default system filter, i.e. turn on by default (with a fixed Activity Source name).
  2. Does it support both background jobs and recurring jobs.

I found GlobalJobFilters.cs, which I think answers my first question.

I will work on using the alternative approach. It is nicer, as it neatly decouples the work. I also like putting the activity in the context (doesn't work for job parameters, which may be serialized to another machine, but I presume the start and end events run in the same process).

sgryphon commented 5 days ago

I've re-written the code as a global filter. I think it is much nicer to decouple like that -- and is shows the good architecture/structure of Hangfire that it is easy to use.

I used your filter code as a base, and then merged with my code (& applied my suggestions).

I added it as default to global filters, and then checked that the samples run okay (updated readme & samples for new location of the constant).

Some notes:

The one thing I haven't updated yet is the unit tests, as the previous code was built into Worker, etc so I had tests for those. I'll have a look at the tests for other filters and see if I can add anything valuable.

sgryphon commented 5 days ago

New unit tests, for the filter, now added.