Open dotMorten opened 5 years ago
Thanks @dotMorten. This is likely to be expected, as Core is not currently ngened. We'll address before Core3 ships.
cc @brianrob @billwert
Baiting @benaadams @AndreyAkinshin if they have interest to look for low-hanging fruit here 😉
Also looking at allocations by running this parse method many many times, I see this hit over 40 times per parsing:
And here's some of the CPU analysis:
Loading the assemblies' custom attributes takes a lot of time. The more assemblies that are loaded, the slower it will take. If you create a simple hello world app that tests this, .NET Framework loads 5 assemblies but .NET Core 3.0 preview loads 23 assemblies.
There's a XamlSchemaContext
that could perhaps be reused since it's the class that loads all assemblies.
EDIT: If a XamlSchemaContext
is used, .NET Core 3.0 preview is faster than .NET Framework 4.7.2 (tested 64-bit)
I love seeing this kind of issue. Great to see the focus on fundamentals.
Re GetCustomAttributes, it was improved by @nickcraver in https://github.com/dotnet/coreclr/pull/20779 but that change would already be in preview 1.
@danmosemsft It's also 24 loaded assemblies (.NET Core 3.0 latest daily build) vs 5 loaded assemblies (.NET Framework 4.7.2).
If you patch System.Xaml to only check the first 5 returned assemblies (.NET Framework only loads 5 assemblies), then .NET Core 3.0 code executes faster.
So it seems the 5x number of assemblies is causing the slowdown. It has to execute more code.
Original .NET Core 3.0 daily build:
Name | Exc % | Exc | Inc % | Inc |
---|---|---|---|---|
module system.xaml <<system.xaml!XamlServices.Parse>> | 24.0 | 1,073 | 91.6 | 4,090 |
module system.private.corelib <<system.private.corelib!Attribute.GetCustomAttributes>> | 14.8 | 662 | 36.3 | 1,619 |
module coreclr <<coreclr!JIT_New>> | 10.3 | 462 | 10.6 | 472 |
module clrjit <<clrjit!CILJit::compileMethod>> | 5.4 | 239 | 5.4 | 243 |
module coreclr <<coreclr!JIT_NewArr1>> | 5.4 | 239 | 5.7 | 253 |
module coreclr <<coreclr!ModuleHandle::ResolveType>> | 5.2 | 234 | 5.2 | 234 |
module coreclr <<coreclr!MetaDataImport::GetCustomAttributeProps>> | 5.2 | 231 | 5.6 | 248 |
module system.private.corelib <<system.private.corelib!CustomAttributeData.GetCustomAttributesInternal>> | 4.9 | 219 | 22.9 | 1,023 |
module coreclr <<coreclr!Attribute::ParseAttributeArguments>> | 4.1 | 183 | 5.4 | 241 |
module coreclr <<coreclr!MetaDataImport::Enum>> | 3.4 | 152 | 4.0 | 178 |
module ntdll <<ntdll!LdrpDispatchUserCallTarget>> | 2.6 | 116 | 2.6 | 116 |
module coreclr <<coreclr!MetaDataImport::GetParentToken>> | 2.6 | 116 | 2.8 | 127 |
Finally had a chance to look into this. Really great to see folks engaged here.
I was able to reproduce the difference and captured traces of 10K iterations of the parse operation to compare.
As was mentioned above, the biggest issue that I see is significantly higher cost in and caused by Attribute.GetCustomAttributes. The higher cost shows up in both CPU usage in Attribute.GetCustomAttributes and in allocation helpers that are zeroing memory to support increased allocations in this path.
The biggest CPU usage differences are:
Breaking down the allocations that contribute to the additional allocation helper CPU (as well as memory cost), these are the top 3 allocators:
The first step here should be for this to be investigated from the reflection angle. @joshfree, can you get eyes on this from a reflection expert? I can share traces offline and help ramp someone up on the data.
Having taken some steps here already, it definitely seems like we can eliminate CustomAttributeRecord[]
in some way. These are allocated only to be iterated in the caller in all cases, but the way they're populated is non-trivial to change. I have a branch locally that tried to eliminate these but it creates some very odd null refs in ways I didn't readily understand (but someone else may quickly see why I'm an idiot there).
If you're curious, this is where they're generated for a great many attribute code paths.
@brianrob
Attribute.GetCustomAttributes: 560ms on Core vs. 265ms on Desktop = +111%
This is probably because .NET Core loads 24 assemblies but .NET Framework only loads 5 assemblies. The code gets custom attributes of all loaded assemblies.
@0xd4d, you're right - the iteration over all assemblies happens here: https://github.com/dotnet/wpf/blob/08d1133388dac9f033e16bb7309d0d2d409845a2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/XamlSchemaContext.cs#L1125
As @NickCraver points out, there is possibly some optimization that can happen in the reflection path, but it is also true that there may need to be some work done in Xaml parsing to either limit the number of assemblies that get processed, or cache the results. Without a clear understanding of the WPF technology, it's not clear to me what should be done here. @fabiant3, are you the right person to look at this?
cc @steveharter for Reflection
18k calls from Parse
-> XamlXmlReader.Initalize
-> NodeStreamSorter.ctor
-> StartNewNodeStreamWithSettingsPreamble
does become 14M calls to FilterCustomAttributeRecord
and that's 44% of the time for Parsing.
or cache the results.
XamlSchemaContext
has 10 ConcurrentDictionary
s (instance rather than static) so I wonder if it had that in mind?
The main cost for Parsing (64%) happens under System.Xaml.XamlSchemaContext.TryGetCompatibleXamlNamespace(String, out String)
where its still the same number of calls as Parse (18k in this case) and then begins to fan out.
Would it be valid to make those ConcurrentDictionaries
static
?
Sounds like the context is quite flexible on usage:
/// SchemaContexts can potentially be shared between multiple callers in an AppDomain, including
/// both full and partial trust callers. To be safe for sharing, the default implementation should
/// be idempotent and order-independent--i.e. functionally immutable.
Though I don't know if lookup should be unique in some way (e.g. different Xaml files having explicitly different contexts)
There are multiple things we can yet do in reflection and I'd like to get to them, but I second @benaadams's approach here on an actual solution that would make the differences really desired here. The problem with reflection performance optimizations is they hit a hard wall with contracts: they must return a new attribute set on each call - we cannot return the same instances (this would break all sorts of existing code).
The only way we can cache instances (after deciding it's valid to do so) is well above the layer those allocations happen at, e.g. in a lookup dictionary such as those in play here.
I just wanted to add here: in doing some benchmark down we discovered that while my optimizations went into 3.0 Preview 1, some steps backwards also landed in there. Details are in https://github.com/dotnet/coreclr/issues/21456 and a PR to fix that regression is in https://github.com/dotnet/coreclr/pull/21462. That should help out here, but only marginally compared to not actually calling this path more than once as proposed above.
@steveharter can you assist in the investigation from the coreclr/Reflection side of the house?
Sure I'll take a look asap at the reflection side
Moving this to 3.1. The discussion has targeted the initialization of XamlSchemaContext within XamlServices.Parse (et al.), which examines more assemblies than it did in NETFx. I don't think it's critical to fix this for 3.0, because
In short, the effect will be felt in practice only by apps that load lots of small XAML snippets. This isn't the mainstream case, and such apps have a workaround.
Details supporting these claims.
All apps load BAML, which uses a built-in SchemaContext. But relatively few apps load loose XAML via explicit calls to XamlServices.Parse et al. At least that's my impression from looking at apps that have come my way (largely through bug reports) - we don't have telemetry for this.
The 27% figure comes from parsing a very simple string (one XAML tag); a real app typically parses longer strings, and will see less of a difference. I tried strings with a few hundred tags, and the difference was not worth noting.
Most of the XamlServices methods call new XamlSchemaContext()
, but an app can avoid the initialization expense by using the one method that doesn't. Replace XamlServices.Parse(mystring)
with
StringReader stringReader = new StringReader(mystring);
using (XmlReader xmlReader = XmlReader.Create(stringReader))
{
XamlXmlReader xamlReader = new XamlXmlReader(xmlReader, CachedSchemaContext);
return XamlServices.Load(xamlReader);
}
where CachedSchemaContext
is a XamlSchemaContext that the app creates once and re-uses every time it calls Parse
. (Similarly for XamlServices.Load(stream)
et al.)
An app that does this is responsible for releasing the cached context when it's no longer needed, if that's relevant to the overall behavior. This is probably not an issue, except perhaps for apps with complex interaction with AppDomain, or similar advanced architecture.
I've tried building this idea into XamlServices itself. The early results are good, but it needs more investigation. For example, XamlServices can't know when to release the cached context, and I don't fully understand the consequences of keeping it alive "forever".
and I don't fully understand the consequences of keeping it alive "forever"
If thats the chosen path the cache might need weak references (making it even more expensive) to not block unloadable assemblies "forever", otherwise it would be a memory leak with apps having reloadable plugin architectures or are emitting generated code in collectible assemblies at runtime. We are doing the latter on desktop, I haven't used AssemblyLoadContext in Core yet so I don't know entirely the implications, but its something to keep in mind.
Changed the title to reflect the real issue.
executing HTTP requests in WPF command is much slower on .NET Core idk what causes this but it works fine in .NET Framework. (I'm using refit btw)
not sure if it's WPF problem or .NET Core problem or refit problem... https://github.com/dotnet/corefx/issues/23401
EDIT:
SOLVED. See this comment https://github.com/dotnet/corefx/issues/23401#issuecomment-561671803
For reference: @bugproof comment is unrelated to this issue or to WPF. (.NET Core implements HTTPRequest differently from .NET Framework.)
Just FYI it is issues like this that have caused us to stop our port to netcore. I had hoped performance would be better. I hope these issues continue to see activity and improvement.
@rolandh - We determined that this issue only affects apps that load lots of small XAML snippets using XamlServices.Parse. Is that your case? (We believe it's not all that common, but we don't have any hard evidence to either support or refute that belief.)
Please continue to open issues, or upvote existing issues, about specific perf concerns. Comments about "issues like this" are not really actionable, as no one knows what you're referring to specifically.
I haven't experienced any issues as we haven't ported to net core yet. I was just saying that seeing multiple issues like this has meant that we won't be considering porting ove from net 4.7 in the short term. I've seen quite a few benchmarks showing slow downs, I haven't seen any showing performance increases in WPF yet, when I do I'll look at porting. I thought that might be good information for MS on how to prioritise issues and development.
I know its not an actionable comment for you, but it might be for someone higher up who wants to see a higher percentage off people port their applications.
I was just saying that seeing multiple issues like this has meant that we won't be considering porting ove from net 4.7 in the short term. @rolandh can you articulate the multiple issues that you have been and concerned you moving to .NET Core besides this one? That will help us to priorities perf investigation.
@Lxiamail
Pretty much every major library's first iteration to netcore has had slowdowns. Sometimes dramatic, eg 10x slower with entity frameworks 2.1 release.
I see almost nothing showing performance benefits, only contrived small applications seem to benefit. Everything that is a large in production library seems to go backwards and introduce bugs.
There is no incentive for us to upgrade after seeing issues like this repeatedly occur after porting to netcore.
Here is a list of relevant cases I've seen when researching whether netcore is stable enough to upgrade yet.
https://github.com/dotnet/wpf/issues/1549 https://github.com/dotnet/wpf/issues/1748
https://supportcenter.devexpress.com/ticket/details/T824984/wpf-core-3-0-performance
This was nopcommerce after upgrading to netcore https://www.nopcommerce.com/en/boards/topic/54986/nopcommerce-410-high-memory-usage-and-how-weve-handled-it
EF Core large slow downs vs the older entity framework https://github.com/nopSolutions/nopCommerce/issues/3419
Just my 2 cents: I also ported a large WPF+EntityFramework app to core 3.0 and found that load times increased from <4 seconds for .Net Framework to 5+ seconds with .Net Core 3.0. This is with creating native images for both in release and debug builds. Runtime performance seemed largely unchanged so I abandoned this porting effort and will try this again with .Net 5 when it is released. There are numerous reports of slow-downs like in the post above and also this one: https://github.com/dotnet/wpf/issues/94
My impression is that some work is required to address all these already reported WPF + .Net Core performance issues in order to make it worthwhile to perform this migration for more complex applications with lots of assemblies and XAML resources.
Cheers Philip
@ajcvickers re EF feedback above.
I seemed to encounter the performance problem too; The same wpf code target .net core 3.1 and another .net 4.8, the latter memory usage is about 20MB while the former is 32MB+ almost 60MB at the startup of program; Read a same file as hex and show on a TextBox, the memory usage is above, still, net core cost more memory than net48 ; I felt confused
@laggage your .NET 48 is running as 32-bit and .NET Core is running as 64-bit
Does adding <RuntimeIdentifier>win-x86</RuntimeIdentifier>
to your .csproj help? (moving it to 32-bit)
@benaadams I tried your suggestion, but still, core wpf just cost more memory than net4.x wpf;
@SamBent This issue was milestoned to core 5.0 but I don't see any updates even though 5.0 is being released. Is there reason to believe these performance regressions were fixed and the people on this forum should evaluate this again in their applications.
The original issue - startup cost of XamlServices.Parse - is not yet addressed in 5.0. It only comes up if the app explicitly asks for it, so it's not a factor in load times of most apps.
A number of other issues have been mentioned here, most of them outside WPF's scope. You can follow the links to check on the status with the appropriate owners.
@SamBent In my case the problems may be DataTemplate instantiation via XamlReader.Parse which I assume internally calls XamlServices.Parse.
Will this issue be assigned to a new milestone or will WPF apps which make using of XAML parsing see permanent load time degradation if ported to .Net Core?
This was milestoned to 7.0 but the startup time of WPF apps running .net7.0 are still very slow compared to .net 4.8
Pls is there any updates on this from .net 8
This is a very old thread but can someone give any benchmarks for net 8 vs net 4.7/4.8 ?
This is a very old thread but can someone give any benchmarks for net 8 vs net 4.7/4.8 ?
The issue is still present. We recently migrated our net48 product to .net8. We have automated benchmarking after each build. We can clearly see how almost everything at startup is performing better except the xaml parsing. This negates all performance gains of the other parts of the code where thing are performing better. We end up with around 10% slower startup times, because wpf performance issues. Everyone is very disappointed because .net 8 promised performance but I hasn't been delivered in wpf apps.
Thank you for saving us an immense amount of effort for little gain.
I ran the benchmarks in the original post locally but I now compare .Net Framework 4.8 vs .Net 8.0 and here are my results:
BenchmarkDotNet v0.14.0, Windows 11 (10.0.22631.4037/23H2/2023Update/SunValley3)
AMD Ryzen 7 7745HX with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.7.24407.12
[Host] : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-XTDWJG : .NET 8.0.7 (8.0.724.31311), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job-ELJHHU : .NET Framework 4.8.1 (4.8.9261.0), X64 RyuJIT VectorSize=256
Method | Runtime | Mean | Error | StdDev |
---|---|---|---|---|
Test1 | .NET 8.0 | 150.6 us | 3.01 us | 6.08 us |
Test1 | .NET Framework 4.8 | 183.2 us | 3.31 us | 4.54 us |
It looks like .Net 8.0 is faster than .Net Framework 4.8 for this specific benchmark on my machine. If others encounter any performance regression from .Net Framework, it would be great if they could share them and ideally include a repro.
Does the bug reproduce also in WPF for .NET Framework 4.8?: No
Problem description: I expected/hoped that with the XmlReader hopefully being
Span<T>
based. the XAML parser would be faster in .NET Core than the .NET Framework equivalent. However that's not what I'm seeing with this benchmark:Here are the results: