From time to time we have to investigate situations in which customers complain about individual requests taking a long time (up to 100ms) to pass through gorouter. While investigating ways to find out what causes such delays we came across the built-in tracing functionality of GoLang. Unfortunately the tracing is relatively useless if you don't instruct your code to record certain trace points.
This PR proposes to add an initial set of trace points. Since we don't really know what we are looking for the trace points are a bit scattered around. Please let me know if you think that certain trace points are pointless or if there's any other sections that should have tracing.
Since we have benchmarks in the registry I checked what the impact of the tracing points would be:
Note: tracing is not enabled, this is just due to the trace points existing and being part of the execution path, they are not recording data.
Given that BenchmarkRegisterWithConcurrentLookupWith100kRoutes-12 is the closest to what gorouter actually does and it only increases the time per register by ~3% I don't expect any major impact by this change.
Once we have the opportunity to collect some traces on our productive systems we will have more insights into which regions should be traced and which ones are less interesting. We should then re-visit the topic and adjust the trace points accordingly.
I will keep the branch around but given the efforts to move towards OTel I don't think it's a good idea to introduce go specific tracing at this point.
Summary
From time to time we have to investigate situations in which customers complain about individual requests taking a long time (up to 100ms) to pass through gorouter. While investigating ways to find out what causes such delays we came across the built-in tracing functionality of GoLang. Unfortunately the tracing is relatively useless if you don't instruct your code to record certain trace points.
This PR proposes to add an initial set of trace points. Since we don't really know what we are looking for the trace points are a bit scattered around. Please let me know if you think that certain trace points are pointless or if there's any other sections that should have tracing.
Since we have benchmarks in the registry I checked what the impact of the tracing points would be:
Note: tracing is not enabled, this is just due to the trace points existing and being part of the execution path, they are not recording data.
Given that
BenchmarkRegisterWithConcurrentLookupWith100kRoutes-12
is the closest to what gorouter actually does and it only increases the time per register by ~3% I don't expect any major impact by this change.Once we have the opportunity to collect some traces on our productive systems we will have more insights into which regions should be traced and which ones are less interesting. We should then re-visit the topic and adjust the trace points accordingly.
Backward Compatibility
Breaking Change? No