Particular / NServiceBus

Build, version, and monitor better microservices with the most powerful service platform for .NET
https://particular.net/nservicebus/
Other
2.08k stars 648 forks source link

Make Performance Counters completely optional #3386

Closed boblangley closed 8 years ago

boblangley commented 8 years ago

Performance counters should either be optional (perhaps as a separate package) or opt-out for users that do not use them.

MarcinHoppe commented 8 years ago

What would be the benefit of removing them?

I'd say anyone who runs a production system needs performance counters.

danielmarbach commented 8 years ago

I think for more high throughput perf counter should not be used but ETW instead. The problem is ETW means also OS lock-in

danielmarbach commented 8 years ago

https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md

andreasohlund commented 8 years ago

:+1: this has come up numerous times in the past. We would have to expose the "message pulled from queue" , message processed and message failed as events so that the external package could hook into them and provide the counter

andreasohlund commented 8 years ago

What would be the benefit of removing them?

They require installation and tends to get corrupted now and then

MarcinHoppe commented 8 years ago

@danielmarbach I think performance counters are good for endpoint level metrics. ETW should be good for message level metrics. If we want to give users visibility into what's happening to each message as it travels through the pipeline, ETW is the tool of choice. Separate issue?

@andreasohlund I think making performance counters optional will only make the installation issue worse than it currently is. I think the typical scenario would be that the user discovers that the endpoint runs slow. They go look for more data and discover there is no data (counters being optional will probably not be a first-time concern for most developers). Now they need to add a NuGet, rebuild, retest and redeploy (possibly modifying their deployment scripts to account for counter installation) to get more insight into behaviour of their endpoint. All in a potentially problematic situation of an endpoint running slow.

Does this make sense?

danielmarbach commented 8 years ago

For overall application level perf metrics performance counters are probably enough. If I recall correctly though ETW doesn't require any installation if you are running on later than Vista. But perf counters do require registering the perf counters. Which might be the killer argument to switch to ETW.

The previous chapter discussed performance counters which are excellent for tracking your application’s overall performance. What performance counters cannot do is provide any information on a specific event or operation in your application. For that, you need to log data per operation. If you include time stamps then you now have the ability to track performance in your program in an extremely detailed way. There are many logging libraries for .NET. There are some popular ones like log4net, as well as countless custom solutions. However, I strongly encourage you to use Event Tracing for Windows, which has a number of advantages:

  1. It is built into the operating system
  2. It is extremely fast and suitable for high-performance scenarios
  3. It has automatic buffering
  4. You can dynamically enable and disable consuming and producing events during runtime
  5. It has a highly selective filtering mechanism
  6. You can integrate events from multiple sources into one log stream for comprehensive analysis
  7. All OS and .NET subsystems generate ETW events
  8. Events are typed and ordinal-based instead of just strings

Source: http://www.writinghighperf.net/

MarcinHoppe commented 8 years ago

So it looks like we're in agreement into where ETW fits best. Do we file a separate issue to generate ETW events at message level? Do we have a single customer that voiced a desire to have this sort of visibility? Maybe we're the stakeholder here: this should probably be a good support / troubleshooting feature.

The ETW does not solve the problem of exposing endpoint metrics to monitoring tools such as SCOM or Nagios. Or does it?

andreasohlund commented 8 years ago

Is ETW or perf counters supported on donetcore? (this is all I could find http://forums.dotnetfoundation.org/t/monitoring-coreclr/820 => seems like no it doesn't)

@johnsimons did perf counters come up as a blocker when you ran the tool to check?

andreasohlund commented 8 years ago

Actually @danielmarbach already answered it

The problem is ETW means also OS lock-in

So we need to make perf counters etc optional if we want to support dotnetcore

Scooletz commented 8 years ago

If we wanted to keep a notion of performance counters and leave a possibility of use different implementations the strategy used by Singalr looks very good (definitions with properties for either noop or a real perf counter):

https://github.com/SignalR/SignalR/blob/fa864e866bea4737dbd888f993892694f2f6b1a6/src/Microsoft.AspNet.SignalR.Core/Infrastructure/PerformanceCounterManager.cs

MarcinHoppe commented 8 years ago

@andreasohlund .NET Core is a really good argument for making it pluggable. I'd still argue for logging a warning if no perf counters are set up but could be (i.e. on full .NET Framework on Windows).

I don't like the direction that .NET Core is taking here (i.e. ditch counters in favor of eventing), because it leaves all the work of integrating with external monitoring systems to the user. It probably also means settling on the lowest common denominator, which is SNMP.

@Scooletz The SignalR example is certainly one way of implementing it.

mauroservienti commented 8 years ago

We discussed a lot both SNMP and Metrics.NET as a possible replacement/addition to PerfCounters.

Scooletz commented 8 years ago

SNMP - that would be great as a number of tools being able to consume our statistics increases drastically. Have we considered HdrHistogram for gathering and aggregating stats in the process? That's what EventStore does and if we want to provide valid 99th, 99.9th & more percentile values.

boblangley commented 8 years ago

+1 for making this pluggable. NSB should be adaptable to what the enterprise is already using (WMI, SMTP, JMX, etc).

MarcinHoppe commented 8 years ago

:+1:

andreasohlund commented 8 years ago

Replaced and tracked by https://github.com/Particular/PlatformDevelopment/issues/747

robv8r commented 8 years ago

I realize that this thread is closed, but wanted to comment on the subject of Platform Lock-in with ETW.

I agree that "ETW" itself is only supported on Windows. However, the EventSource type that's used to publish events that can be consumed by ETW is fully supported in .Net Core.

@danielmarbach included this link earlier in this thread: https://github.com/dotnet/coreclr/blob/master/Documentation/coding-guidelines/cross-platform-performance-and-eventing.md

On the same page, the proposal is to continue using the .Net EventSource type in .Net Core.

Ideally, EventSource operates on Linux and OS X just like it does on Windows. Namely, there is no special registration of any kind that must occur. When an EventSource is initialized, it does everything necessary to register itself with the appropriate logging system (ETW, LTTng, DTrace), such that its events are stored by the logging system when configured to do so.

Additionally, .Net Core includes a new type that hasn't made it into the .Net Framework yet. If I understand things correctly, the EventCounter type is intended to be used like Performance Counters are used today. The advantage is that they can be used across all platforms. The EventCounter type is included in the Microsoft.Diagnostics.Tracing.EventSource.Redist NuGet Package for use in the .Net Framework 3.5 and above as well as Win8 and WPA81 PCLs.

You probably already know that I'm a big proponent of the EventSource and it's counterparts. Coming from a (primarily) Windows world, I tend to incorrectly use the terms ETW and EventSource interchangeably.

For clarification: