Closed cwe1ss closed 7 years ago
Christian, as of today there are no plans to do any major development in Semantic Logging codebase. On the other hand, EventFlow is under active development. The guidance on which one to choose can be summarized as follows:
Hope this helps!
@cwe1ss thanks for your question -- I was wondering if out-of-process is an important feature to you, if so, why?
Thank you very much for the great and quick response!!
@karolz-ms : The OMS agent doesn't support ETW events, does it?!
@qubitron:
We are pretty much all-in on Azure with our upcoming project and we use Service Fabric, OMS, Azure Diagnostics, etc. I'm also experimenting with Application Insights. However, I'm not really happy with what we have so far. Actually, I'm on a pretty long journey to find a good logging solution for our system.
My perfect scenario looks like this:
In-process-logging has advantages as well, but it also brings the burden of configuration, resilience, buffering into each application. I don't want to redeploy/restart 50 applications when my Application Insights ID, etc. changes.
I'm happy to discuss this further should you be interested.
PS: I'm also working on the .NET implementation for opentracing.io - it's somehow related to this topic as well!
@cwe1ss, indeed, looks like OMS agent does not support ETW events :-(
Having said that, thank you very much for comprehensive reply and sharing your logging journey with us. It is very helpful. We have been on our version of the journey here for a long time too. I am glad to say that a lot of what you have discovered matches very well with our thinking, specifically:
But I noticed one key difference between what you said and what we were targeting with EventFlow and other work here at Microsoft. We assume that applications running on a Fabric cluster are largely independently configured, and that includes diagnostics too. In other words, we are aiming for microservice-style approach.
For example, if you have 50 applications, we think you will also have close to 50 distinct ApplicationInsights instrumentation keys.
This makes a cluster-wide configuration for diagnostics very unappealing--it couples the services in such a way, that when you want to change something about diagnostics for a particular service, you are forced to change a cluster-wide configuration piece. How it is done does not matter that much--even if the process is convenient and fast, it is still breaking the isolation of services. In fact, this necessity for cluster-wide configuration changes with WAD agent was #1 adoption blocker for Fabric for some teams here at Microsoft, and that (and a few other things) prompted development of EventFlow in the first place.
So, bottom line, we thought that having diagnostic configuration be part of service configuration is actually a desirable thing.
I do agree with the fact that in-process logging brings the problems of resilience, buffering, having multiple outbound connections from all cluster nodes etc. EventFlow is designed specifically to mitigate some of these problems. We also are discussing the possibility of taking best of both in-process and out-of-process approaches, so that the process of sending data from the node is factored out into a separate "log forwarder" service. That would reduce the number of outbound connections and buffering/retry/network handling logic. We do not have anything specifc to share yet. This approach is a performance/resilience optimization though; it would not fundamentally change the principle of "diagnostic configuration is service configuration".
Please do share more thoughts and thanks again Karol
I'm happy to hear that we are on the same journey! :)
I see the following issues with what you've described regarding separate configurations:
The current Application Insights pricing just doesn't allow you to have separate instances for each service in a microservices architecture. The GA pricing is per instance AND per node. With 50 services on 10 servers, you would have to pay for 500 instances - currently 15$ * 500 = 7500$ per month. And this doesn't yet include data volume. This is too expensive - at least for us.
Application Insights accounts are too isolated. You need the "big picture" and reports that cover all accounts. Azure Application Insights Analytics would be nice for this, but it currently doesn't work across accounts. This means, you must use OMS as well and the "Connector for OMS Log Analytics". This also means you have to pay twice for your data volume.
Even if a system consists of many small parts, logging/tracing should usually go into one central system. I don't think it's common that companies will log half of their services in Application Insights and the other half in e.g. New Relic. Requiring different instrumentation keys is just a necessary implementation detail of Application Insights.
Having to create a separate Application Insights account for each service is too much ceremony. In a microservices world, services come and go. Ideally, the logging system (e.g. Application Insights, Log Analytics) would just see a new process name and automatically detect this as a new service.
There's also different environments so you would quickly end up with hundreds of Application Insights accounts.
Most services share common reporting requirements. Environments typically consist of web-based frontend services, request based backend services (web, queueing, ...) and background workers. I don't want to configure every Application Insights account with my required set of customizations.
So, after having told my "perfect solution for log creation", here's my "perfect solution for log processing/reporting" on Azure:
Log Analytics has come a long way lately, especially with the new custom dashboards feature. Making this the one true logging solution that combines infrastructure and application data would be awesome.
Obviously, this is more of a dream than a request... but one may dream! :smile:
Christian, thanks for a comprehensive reply and many good thoughts.
I agree that the cost of AI (or, in general, the diagnostic backend you use) is a big concern. As you pointed out, for AI specifically, having a separate instrumentation key per microservice is not cost effective, nor desired from the perspective of correlating data across multiple iKeys (which is hard). This is not what I meant to suggest though :-) -- I had an instrumentation key per application in mind, i.e. a set of related services. Sorry if it did not come out clear. I also agree that even per-application iKey might be too granular.
That said, there is also performance aspect that points in the opposite direction, towards partitioning your diagnostic data. Even with AI Enterprise plan, the limits on how much data you can send (see the docs) are not very high. They may be lifted in future, but whatever they are, at some scale point you might hit the limit and be forced to partition. This thinking applies to whatever backend you use. Depending on the size of your VM scale set or Service Fabric cluster, the mix of applications running on it and the capabilities/cost of your diagnostic backend(s), there is an optimal degree of reuse. It may be typically high, but I don't think we can assume it is single backend per cluster.
Also we need to think about full set of scenarios that we need to cover for microservices on Azure. In the case of Service Fabric in particular, multiple instances of the same application, and multiple versions of the same application may run side-by-side on the same cluster. In fact, some of our Fabric use exactly the technique of instantiating the same application multiple times to ensure high degree of separation between application instances tied to their large-scale customers. A common requirement in this case is to send a portion of application logs to whatever customer diagnostic store might be. Obviously you do not want to mix the logs for these different customers, who might be competitors! So every instance has its own backend (plus the backend that is used by the application provider for their own purposes). Note that this also means the process name is not sufficient to distinguish between service processes running on behalf of different customers.
The way I read your "perfect solution for log processing/reporting" is that at the core you want low-ceremony, remote, easy to customize, and fast-applicable configuration of diagnostics for services. And you want some predefined templates to get you started quickly. If that is the goal, makes perfect sense. I believe this can be achieved in various ways. It does not matter that much is the service process or a dedicated daemon sending the data out of the nodes. For reasons listed above I still believe that, architecturally, it makes more sense if the diagnostic configuration is associated with individual services (deployed service instances). The challenge is then to have an admin experience inside OMS/AI that meets your goals. I think we can meet this challenge (well, gradually :-) ).
Oh, and we are working on merging OMS Log Analytics and Application Insights and its "Log Analytics" part. No details to share at this point, but this is a definitive, upper-management-demanded long-term goal.
For people who are reading this. I just found the following on the Application Insights pricing page:
What if I use the same node for multiple applications that I’m monitoring?
No problem. We only count the unique nodes sending telemetry data within your Azure subscription (billing account). For instance, if you have 5 separate websites running on the same physical server and each website is configured with Application Insights Enterprise (charged per node) then collectively these will count as one node.
You can also have applications using Application Insights Basic (charged per GB) in the same Azure subscription and this will not affect the node counting for the applications that use Application Insights Enterprise.
Not sure when this was introduced but this is a very interesting and welcome change!
Hi, since Semantic Logging is also from Microsoft, I'm wondering what your guidance is on when we should use which.
It seems like the main difference is that Semantic Logging can be used out-of-process, however there hasn't been much activity lately and the new .NET 4.6 features (rich payload, ...) are not yet supported.