Closed gfraiteur closed 4 years ago
Attaching a repro. Extract the zip file and execute e97d0f14c66440ee775be26c1f9a34cf.dll
as a console app.
netcoreapp2.0.zip
Can you share the source for the repro? And, actually, a test that doesn't use PostSharp? If there's a breaking change here, it'd be helpful to see the most minimal repro possible, written just against the APIs and without extra tooling required.
it returns a different value depending on the state of the state machine
Can you elaborate on this? You mean you access ATMB.Task on the same instance multiple times, and it returns a different Task instance each time? That is definitely not expected. Or do you mean that you might get back different Task-derived types from different ATMB instances depending on circumstances? If so, that is expected.
Could you document which values are returned in which circumstances?
No, I do not think that's a good idea. From the perspective of a consumer of the API, it gets back a Task
; whether it's some special derived implementation is an implementation detail that we should be free to change and optimize in the future. The changes made for 2.1 were exactly because they yielded significant performance and diagnostic benefits, and if we'd previously documented exactly the concrete class returned, that wouldn't be possible. In general in .NET, it is not considered a breaking change to return a more derived type than the one explicitly specified as the return type, just as it's not considered breaking to return different implementations of an interface from a method declared to return that interface.
Thank you for your reply.
Can you share the source for the repro?
Done. But you won't able to build it without a PostSharp build that is still unpublished. OnMethodBoundary_CurrentTask.zip
Can you share ... a test that does not use PostSharp ?
No, I don't think it is possible to access the current state machine class in C# without PostSharp. We inject code that exposes ATMB.Task to the aspect and this aspect does pretty similar things than your causality tracker (we're doing async logging "the right way", with causality).
Can you elaborate on ... it returns a different value depending on the state of the state machine
Here is the result of the test on .NET Core 2.0 (when you run the compiled attach file):
dotnet --fx-version 2.0.0 C:\src\postsharp\Private\Test\SmallTests\bin\Aspects\SlimAdvices\OnMethodBoundary_CurrentTask\CS\NetCore\Debug\netcoreapp2.0\e97d0f14c66440ee775be26c1f9a34cf.dll
OnEntry Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnSuspend Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnResume Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnSuspend Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnResume Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnExit Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
Passed.
And here is the result on .NET Core 2.1:
C:\src\postsharp\Build\Testing\..\..\Private\Test\BuildTestRunner\bin\Debug\BuildTestRunner.exe dotnet --fx-version 2.1.0-rc1 C:\src\postsharp\Private\Test\SmallTests\bin\Aspects\SlimAdvices\OnMethodBoundary_CurrentTask\CS\NetCore\Debug\netcoreapp2.0\e97d0f14c66440ee775be26c1f9a34cf.dll
OnEntry Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnSuspend Task = System.Threading.Tasks.Task`1[System.Threading.Tasks.VoidTaskResult]
OnResume Task = System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[System.Threading.Tasks.VoidTaskResult,BuildUnitTests.Aspects.SlimAdvices.OnMethodBoundary_CurrentTask.Program+<AsyncMethod>d__1]
Our test shows that, on .NET Core 2.1, when we call ATMB.Task before the first await, we get a simple Task (presumably the object returned by InitializeTaskAsPromise). But when we call ATMB.Task after the first await, we get an AsyncStateMachineBox.
No, I do not think that's a good idea.
I understand. We need to know if ATMB.Task is supposed to be idempotent. If yes, the behavior in 2.1 is a bug and we will wait for a fix. If not, we will implement a workaround. Note that ObjectIdForDebugger already implements a workaround and we would be able to use it if it were not internal.
when we call ATMB.Task before the first await
Ah, you're code rewriting the compiler-generated code to access the Task property before the first await? That is not a supported scheme / access pattern, and I would not expect the ATMB implementation to change to accommodate it.
Note that ObjectIdForDebugger already implements a workaround and we would be able to use it if it were not internal.
ObjectIdForDebugger implements a workaround to accommodate the case where the debugger is poking at things; debuggers often access private state in an unsupported fashion, and if they break, they break. But note that this workaround makes the implementation more expensive at runtime, and that's considered ok, because this would only manifest if you were interacting with these objects with the debugger, at which point perf doesn't matter.
The issue is that while ATMB is meant for use by the compiler-generated pattern (and isn't really meant for use other than that, which is why it's in the compiler namespace), some perf-sensitive code does use ATMB directly rather than using TaskCompletionSource, in order to have a struct-based source to avoid an extra allocation. In the compiler-generated use, we can lazily initialize the backing field to be the single object that's not only the Task but also contains the async method state machine, contains the execution context, caches the Action delegate, etc. But that also means this data structure is much bigger than a normal task, and so if you're just using ATMB directly in code, all of that extra size is unnecessary and useless cost. So if you access the Task property directly, we continue to just manufacture a normal Task.
If not, we will implement a workaround.
If you can, that's the best the approach.
We could explore alternatives to try to make this more bulletproof, but off the top of my head I don't have any great solutions, and if we did implement something, I don't think they would be backported to 2.1 unless this became a huge issue. I can think of two possible approaches to us "fixing" this:
Thank you for the detailed answer.
As a workaround, it seems enough for us to have an "ObjectId", so we'll add some field to the state machine and somehow initialize it. This will have some performance impact but this will only affect users of a feature that needs this ObjectId (such as logging) and, additionally, the performance overhead of PostSharp is much smaller than the desired effect, e.g. logging.
The workaround will be implemented in PostSharp 6.0.
Here is the impact on users:
Projects targeting .NET Standard or .NET Core built with an older version of PostSharp will be affected by the defect when running on the .NET Core 2.1 runtime. People will have to recompile their project using PostSharp 6.0 (possibly renewing their support subscription).
The only affected PostSharp feature is logging (using PostSharp.Patterns.Diagnostics) of async methods. ATMB.Task is not exposed to other PostSharp public APIes and therefore PostSharp users could not rely on this implementation detail for their own aspects.
In general in .NET, it is not considered a breaking change to return a more derived type than the one explicitly specified as the return type, just as it's not considered breaking to return different implementations of an interface from a method declared to return that interface.
I agree. But FWIW, this broke StreamJsonRpc which was using the .NET Standard-friendly subset of reflection APIs to find the Task<T>.Result
property on the task returned from an async method. Since .NET Core 2.1 returns a Task<T>
-derived type from async methods now, our reflection code fails to find the Result
property and it malfunctions, returning bad data to the RPC client.
We'll fix it on streamjsonrpc, of course, but I wanted to +1 that this was an unexpected breaking change, even if it is a calculated risk by your standards. And again, I agree with your standard.
I wanted to +1 that this was an unexpected breaking change
You're saying you consider it a breaking change in a method:
public Foo Bar() { ... }
when Foo isn't sealed to return a Foo-derived type when previously it was returning a concrete Foo? That's never been the bar.
Any time you use reflection, bets are off unless you are making the same decisions the compiler would have made. By that standard, ignoring inherited Result properties was a bug.
You're saying you consider it a breaking change in a method...
No. I'm saying it was a breaking change in that it literally broke our library.
But I was also trying to emphasize that I agree with your standard that you should continue to make the trade-offs you did for better performance. Your standard shouldn't consider it a breaking change because if people do the right thing, it won't break them. We obviously weren't doing the right thing here, and we'll fix that.
I just hope that such a change as this would never make it into a servicing release.
I'm saying it was a breaking change in that it literally broke our library.
Ok. That's of course possible with any change, any bug fix, any performance improvement, etc.
I was also trying to emphasize that I agree with your standard
Ok.
@stephentoub I'm dumbfounded that this failure only occurs on some machines when running on .NET Core 2.1. Locally, I see the test failure due to this Task<T>
-derived type all the time on Linux and Windows. But on appveyor and Travis CI, both which have .NET Core 2.1 installed, the test passes! The only thing I can imagine is that the Task<T>
-derived type only shows up under certain conditions I am not controlling for. Any ideas?
My goal here is to have our CI/PR validation catch such failures as the one that .NET Core 2.1 manifested in our tests. But so far, I haven't had luck since it only repros locally.
the Task
-derived type only shows up under certain conditions I am not controlling for. Any ideas?
This is from an async Task<T>
method? Its result in 2.1 will be derived iff it completes asynchronously. Any chance it's completing synchronously?
Thanks. At least some of the tests that fail locally have an await Task.Yield()
in the async method, so it seems they couldn't be completing synchronously. With the hint you gave though, I'm going to write a test that narrows it down to just the minimal that should reproduce it, and I'll log like crazy and see what I get on the CI machines.
So bizarre. So after adding logging, the tests all started failing. Then I removed the logging, and the tests are still failing on the CIs (both appveyor and Travis changed simultaneously). So weird. oh well.
Before .NET Core 2.1, AsyncTaskMethodBuilder.Task used to always return the same Task. Starting with .NET Core 2.1, it returns a different value depending on the state of the state machine. (PostSharp is currently relying on this behavior.)
Is that expected? Could you document which values are returned in which circumstances?