Closed NeoXtreem closed 4 years ago
Hi @NeoXtreem
Thank you very much for thorough analysis and we definitely would like to fix/provide a workaround for this or any other issue as soon as possible.
Regarding your analysis - I'm not sure I fully follow the part:
..., and it appears that
ElasticApmModule.Init
callsAgent.Instance
I see that the first line in ElasticApmModule.Init
is call to InitOnceForAllInstancesUnderLock
and InitOnceForAllInstancesUnderLock
in turn calls Agent.Setup
Could you please point where ElasticApmModule.Init
calls Agent.Instance
before indirect call to Agent.Setup
?
Unfortunately I don't see any workaround that won't require code changes but as a quick fix that we can try to release ASAP (i.e., well before the next planned release) is to make ElasticApmModule
use already existing Agent.Instance
. It might cause the Agent to use wrong configuration (for example env vars instead of Web.config) but at least the Agent won't crash the app.
@NeoXtreem thanks for reporting! I'd also have some quick questions.
Does this always happen in your app? Already on the 1. HTTP call into the application?
Do you maybe have a small reproducer?
@NeoXtreem Is it possible that you call Agent's API (using Agent.Tracer
) before ElasticApmModule.Init
is called, for example from Application_Start
?
@gregkalapos @SergeyKleyman Thank you for your replies, and I'm glad to see that this issue resulted in an improvement, although I have since found that the root cause of the particular issue I was experiencing was on a different branch (of my own fork) where I'd mistakenly introduced a bug by making a seemingly-innocuous change. I reverted it a couple of days ago as you can see here. As you'll see from that diff, I'd put the call to InitOnceForAllInstancesUnderLock
in line to where the variable isInitedByThisCall
was being used. Of course, the clue is in the variable name!
However, this leads me to a few things...
1. Unit testing
Perhaps there should be a unit test to protect against this?
2. Design
And the design seems a bit iffy to me. Basically, a call to the property Agent.Instance
is having a side effect that affects behaviour depending on whether a method (namely Agent.Setup
) is called before or after reading that property. This is generally bad practice as stated by Microsoft here:
It is a bad programming style to change the state of the object by using the
get
accessor. For example, the following accessor produces the side effect of changing the state of the object every time that the_number
field is accessed.private int _number; public int Number => _number++; // Don't do this
Of course, I recognise that in this case, the side effect is down to the use of Lazy
and its property IsValueCreated
which is exposed via Agent.IsInstanceCreated
. Although this seems inconsequential on the face of it, this logic effectively causes a side effect on the behaviour of Agent
properties and methods simply by calling other Agent
properties get
accessors.
If you really want to cause this side effect, then this should not surprise the programmer, and changing Instance
to a method such as GetInstance
, and Tracer
to a method called GetTracer
might be better here. At least then, someone changing Elastic APM source code can expect that GetInstance
might cause a side effect. And a client application calling GetTracer
can also expect the same. However, even then, I would be cautious about the naming as Get
on a method also implies it has no side effects. Perhaps it would be better to have an additional method called TryGetTracer
or even better GetTracerIfConfigured
that returns null if it hasn't yet been configured. There's a good use case for this which I've put in my reply to @SergeyKleyman's comment below.
3. Client use case
@NeoXtreem Is it possible that you call Agent's API (using
Agent.Tracer
) beforeElasticApmModule.Init
is called, for example fromApplication_Start
?
Notwithstanding all the above, it is in fact the case that I am calling Agent.Tracer
in my client application because it uses a logger (Serilog) which initialises at startup and can start logging exceptions before Elastic APM Agent has fully initialised. It logs exceptions using Agent.Tracer.CurrentTransaction.CaptureException
as advised here. However, this means that ApmAgent
is initialised from Agent
with _components
set to null
. This results in AgentComponents
setting its ConfigurationReader
property to an instance of EnvironmentConfigurationReader
. As a result, the Elastic APM config in my Web.config is not read because this requires FullFrameworkConfigReader
to be the ConfigurationReader
.
Therefore, the additional method of GetTracerIfConfigured
as suggested above would be highly useful in cases such as this, unless you know of an alternative method or workaround that should be used in such cases?
Thanks for letting us know @NeoXtreem.
Quick info on how/what I'd release:
We wanted to quickly push a patch to address this with the try-catch
added in #597 (PR already open), but given that this was specific to your fork and probably won't happen with the released package, we will skip that patch and just release #597 with the next minor release (which will be very soon).
I leave the issue open to discuss the rest.
I'm in the process of creating a PR for this, but here is some more info about the antipattern that is a stateful singleton that is being used in this case. As described at the aforementioned link:
A singleton is a convenient way for accessing the service from anywhere in the application code.
The model quickly falls apart when the service not only provides access to operations but also encapsulates state, which affects how other code behaves. Application configuration is a good example of this. In the best case, the configuration is read once at the application start and does not change for the entire lifetime of the application.
However, different configuration can cause a method to return different results although no visible dependencies have changed, i.e. the constructor and the method have been called with the same parameters. This can become an even bigger problem if the singleton state can change at runtime, either by rereading the configuration file or by programmatic manipulation. Such code can quickly become very difficult to reason with:
var before = new MyClass().CalculateResult(3, 2);// depends on Configuration.Instance RefreshConfiguration(); // modifies values in Configuration.Instance var after = new MyClass().CalculateResult(3, 2); // depends on Configuration.Instance
Without comments, an uninformed reader of the code above could not expect the values of
before
andafter
to be different, and could only explain it after looking into the implementation of the individual methods, which read and modify global state hidden inConfiguration
singleton.
I've asked a question on SO to get ideas on how to resolve this without DI, and one answer suggests preventing access (by way of throwing an exception) unless Agent.Setup
has been called. This makes sense as it means the Agent
requires initialisation before being used (standard logic I'd say), but this may be a breaking change for some users.
So, a short-term workaround may still be to add a mechanism to only get the instance if it has (or hasn't) been configured.
@NeoXtreem here are some thoughts on this from me.
Yeah, sure, we can add tests, in fact I just went through your PR and that was one thing I mentioned. We can of course also have 1 test combined with the IIS module.
So, the thing here is the following: since the agent is configurable, and it also encapsulates the tracer which knows about things like active transaction and spans across child tasks, there is a huge advantage that the Agent
is a singleton.
For example you can write Agent.Tracer.CurrentTransaction
in an ASP.NET Controller and it’ll give you the current transaction - already filled with all the request details, plus it also knows about all the configs (what log level to use, on which URL the APM Server listens, etc.)
So, that is the reason for the singleton instance.
However, we want the agent to be usable in any environment. In ASP.NET Core and in ASP.NET we initialize the agent - meaning we call Agent.Setup(config);
There are other cases, where only the Public Agent API is used from the agent - for example a console application - or in fact we heard it’s also used on Xbox (yeah, sounds very cool!). In this scenario users start transactions and spans manually.
There was a discussion back when we started, that in case you just use the Public API in an environment where the agent is not initialized by us (e.g. console application) - should be some configuration step needed? We usually default to zero configuration, which in this case mans you don’t need to call anything, the agent will automatically read config from environment variables and you can just start using Agent.Tracer.StartTransaction()
. So if you don’t call Agent.Setup()
we use the defaults, and this makes the life of users using the agent in those environments easier.
Now, in terms of whether we use an anti pattern or not - I’m totally open for a better approach to cover all this, but to be honest I’d rather have something that works fairly acceptable in any environment without major drawbacks.
If you really want to cause this side effect, then this should not surprise the programmer, and changing Instance to a method such as GetInstance, and Tracer to a method called GetTracer might be better here.
This is not a bad idea, my only issue with this is that for the “average case” this makes things complicated. Now you can say Agent.Tracer
, but then you’ll do something like var tracer = Agent. GetTracerIfConfigured(); if(tracer != null)
… Of course, this can be simplified, but what I try to say is that a plain property is the simplest, and that is what you need in most of the cases, except when you do something before the agent is initialized.
An average case to me is something like: you are in your controller method and you want to start a span in it, or you simply use the public Agent API in an environment where we don’t have auto instrumentation - things like this, these are the normal use cases of the API.
So, overall, yes, the method has 1 side effect, which is that if the agent is not initialized, it’ll initialize it with the defaults, that is true for the 1. call, for all the other calls this does not matter, furthermore this does not matter when you use the API in an environment where we configured the agent for you.
The questions is, does it worth adding the complications for every user in every use case?
Just by reading that part of your comment, I think the problem here is that the initialization logic does not allow you to replace the logger, which is definitely a problem, but I think this is more specific to the IIS module. Probably none of the suggestions above would really help with this, right? Because that way you would configure the agent and our configuration step would already fail - or be skipped.
The suggestion given in this answe was considered. As mentioned above, currently you can include the agent in any application and immediately start tracing things and the agent will by default read configs from environment variables. By throwing an exception we’d loose this.
Overall, to me the comment below your question on StackOverflow is the one that seems the most reasonable: if we offer a property to query if the agent is initialized you can use that in scenarios where you are potentially in a stage where the agent is not initialized (I’d say these are a small percentage of calls to the Agent) - and then use the agent as it is in all the other cases - and this is what your PR already does. We can also boost the experience here with some documentation (both in the agent docs and directly in the .cs files in the xmldocs.
I hope this helped with some additional details.
@NeoXtreem
1. Unit testing
Perhaps there should be a unit test to protect against this?
Introducing the change that was reverted by https://github.com/NeoXtreem/apm-agent-dotnet/commit/3835aa4a442b9ee007e2004080eb1f0e4c22948c#diff-5fd6f14d78670b54d69ae733569c0c11 on top of 1.1.2 will break almost all the tests in Elastic.Apm.AspNetFullFramework.Tests
since the sample application will return 500 on every request. I tested it in https://github.com/elastic/apm-agent-dotnet/pull/614 - and here are the results.
@NeoXtreem
1. Unit testing Perhaps there should be a unit test to protect against this?
Introducing the change that was reverted by NeoXtreem@3835aa4#diff-5fd6f14d78670b54d69ae733569c0c11 on top of 1.1.2 will break almost all the tests in
Elastic.Apm.AspNetFullFramework.Tests
since the sample application will return 500 on every request. I tested it in #614 - and here are the results.
That's good! So we can say this is covered :) Maybe not really "unit tests" but I think that is fine.
Maybe worth pointing out for @NeoXtreem that these tests won't run by default when you run all the unit tests from Visual Studio or Rider, the readme file here has all the details on how to turn those on.
@gregkalapos Thanks for your detailed response. Personally, I'm an advocate of conforming to well-established standards (by "well-established", virtually anything in Microsoft .NET documentation qualifies) such as the one I mentioned concerning get
accessors and side effects. The "average" case is subjective and it is difficult to foresee how the API may be used in the wild if there is more than one way to use it. For example, our use case (with the logger initialising and calling Agent.Tracer
before Elastic APM Agent calls Agent.Setup
) caused us to come a cropper and cost us over a day of dev time to pinpoint. Who's to say many others won't be using Elastic APM Agent this way?
To add some context, the setup of the logger is during a method called by Umbraco (the CMS we use) during Global.asax
that is called early, and trace logs happen frequently thereafter. Therefore, a solution such as GetTracerIfConfigured
would indeed work because it would be fine to not do anything if it returns null
. But crucially, GetTracerIfConfigured
would not cause Lazy
to be evaluated. The idea would be to check if a private bool has been set (it would be set during Setup
), and return null
if it hasn't.
All that being said, with respect to the "average" case, my PR #609 will at least provide a means to get around the default configuration for "non-average" cases, and your documentation should hopefully cover this.
But I would still propose to change the properties to methods to adhere to established standards and meet the expectations of all programmers (or at least not surprise them). I mean, how hard is it to call GetTracer()
instead of Tracer
? The body doesn't even need to change (it can remain Instance.Tracer
), but at least it will adhere to Microsoft guidelines (and programmers' expectations in terms of it not being a property that has side effects). And even if you don't agree to this, then at least Instance
could be changed to GetInstance()
(keep body the same) as this is not part of the public API, and it would at least make the internal code adhere to the expectations of contributors to Elastic APM Agent itself.
If you do agree to having GetTracer()
instead of Tracer
, but with the same body, then this is not a behavioural change; so, IsConfigured
(as per my PR) would still be required. Alternatively, you could have GetTracer()
instead of Tracer
(with the same body) and also GetTracerIfConfigured
which will return null
if a private bool is false
(it would be set to true
during Setup
).
Let me know your thoughts.
PS: Glad the Elastic.Apm.AspNetFullFramework.Tests
test the change I reverted as @SergeyKleyman mentioned. I wasn't aware there were tests that are not run by default.
Thanks for all the feedback here.
The initialization problem is solved here - changes on the API are not planned at this point, so we'd prefer to keep those.
Using the current version (1.1.2) of
Elastic.Apm.AspNetFullFramework
and following the configuration instructions here, I get following exception on startup:I've stepped into the Elastic APM code, and it appears that
ElasticApmModule.Init
callsAgent.Instance
which has the effect of causingLazy.IsValueCreated
to always return true thereafter.ElasticApmModule.Init
then continues on the next line to indirectly callAgent.Setup
which throws an exception ifLazy.IsValueCreated
returns true (which it does because of the previous line). Therefore, this is a showstopping bug as the Full Framework Elastic APM module simply cannot be used for ASP.NET with this initialisation logic.It appears @chrisvanderpennen experienced a similar problem on #244, and also @OlegUfaev here on #441, but that relates to ASP.NET Core and can easily be circumvented with a try-catch as @pburrows suggested here. But in my case, I'm using ASP.NET with .NET Framework, so I don't have that option as config is done through Web.config.
Please can this be fixed, and a workaround provided in the meantime?
Stack trace: