Closed rjpowers10 closed 1 year ago
@rjpowers10 have you tried to disable cache from the general site settings? I am guessing there is a cache issue of maybe too many cached object in memory?
@MikeAlhayek thanks for the suggestion but in both my examples (my application built on OC and a plain old OC blog) the cache setting is set to "from environment". ASPNETCORE_ENVIRONMENT
is set to 'Development' for both, so the dynamic cache should be disabled. I tried forcing the setting to 'Disabled' as a sanity check and the results are the same.
Have you tried to disable all sites except for the default one? This may help trying to understand what is causing the high usage. If the default site has normal memory usage, try to enable one of your blog sites and see if you can pin point the site. Then maybe start disabling one feature at a time to see if you can pin point the problem to a single feature. This may give us more clues.
Also, check your log file for any silent errors that would be causing memory consumption.
Not sure if this will help, but something for now. Hopefully, @jtkech or @sebastienros have more ideas here.
How much memory is on the system in total. Take a memory snapshot between two jumps and check the diff. Run in Release, the fact that I see VS might mean it is a debug session.
How much memory is on the system in total. Take a memory snapshot between two jumps and check the diff. Run in Release, the fact that I see VS might mean it is a debug session.
My first example (my app built on OC) is a release build. My second example, which was a plain old OC site with the blog recipe, was in debug. The plain old OC site might just be a red herring, I just found it curious that memory didn't seem to go down after a GC collect. May not be relevant at all to my actual scenario.
The machine has 16GB of memory. SQL Server is also running which takes about 2GB itself. With my monitor I see the memory getting down to 0 and then as expected, the system starts paging and my tests start timing out.
I have a memory diff but I'm having trouble interpreting it. Wondering if anything jumps out to anyone.
Sorted by count diff
Sorted by size diff
Sorted by inclusive size diff
Have you tried to disable all sites except for the default one? This may help trying to understand what is causing the high usage. If the default site has normal memory usage, try to enable one of your blog sites and see if you can pin point the site. Then maybe start disabling one feature at a time to see if you can pin point the problem to a single feature. This may give us more clues.
Also, check your log file for any silent errors that would be causing memory consumption.
Not sure if this will help, but something for now. Hopefully, @jtkech or @sebastienros have more ideas here.
I'm going to try limiting my test suite. Maybe there is one test in particular that exemplifies the problem.
RouteEndpoints in the diff would be explained by a tenant being loaded.
Coming back to your original description, the issue is that your test is taking 8.5Gb, can you try to do the same thing with ASPNETCORE_ENVIRONMENT in Production, not "Development", just to see if that has an impact.
Can you take a memory dump when the app is close to the end to see what is still kept in memory?
RouteEndpoints in the diff would be explained by a tenant being loaded.
Coming back to your original description, the issue is that your test is taking 8.5Gb, can you try to do the same thing with ASPNETCORE_ENVIRONMENT in Production, not "Development", just to see if that has an impact.
Can you take a memory dump when the app is close to the end to see what is still kept in memory?
I ran the test suite again, this time with ASPNETCORE_ENVIRONMENT set to 'Production'. It didn't seem to make a difference. The tests finished in 1 hour, 15 minutes. This is what it looked like at the end of the test run.
This is the memory dump at the conclusion of the run.
Sorted by count
Sorted by size
Sorted by inclusive size
Our memory issues are consistent with this as well. These are our worst offenders in terms of memory usage from our latest performance testing while trying (and failing miserably) to run with 1000 tenants.
000071c5d1578080 449841 25195716 System.Int32[]
000071c5d15de7f0 101131 25791704 System.Byte[]
000071c5daff7f50 585044 28082112 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Type, System.Private.CoreLib],[System.Object[], System.Private.CoreLib]]
000071c5d7d8fbd0 728944 29157760 Microsoft.AspNetCore.Routing.RouteValueDictionary
000071c5d56e17c0 572454 32057424 System.Collections.Generic.Dictionary`2+Enumerator[[System.String, System.Private.CoreLib],[System.String, System.Private.CoreLib]]
000071c5daff8000 138733 37735376 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Type, System.Private.CoreLib],[System.Object[], System.Private.CoreLib]][]
000071c5d14ab110 499016 45688856 System.Object[]
000071c5d1e0d050 31347 47360136 System.Reflection.CustomAttributeRecord[]
000071c5d3fc6db8 963130 78553616 System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib],[System.Object, System.Private.CoreLib]][]
000071c5d157d2e0 1208078 108278276 System.String
Total 16274388 objects
Time to do another pass at it!
I will try with my own "script" but I'd like to be able to repro something that matches your experience. @rjpowers10 how many tenants are you loading and what features do they use? Ideally if you could do the same experiment and confirm it with the smallest possible list of features that would help. @ShaneCourtrille thanks for the confirmation. I'd love to achieve a density of 1000 on 8GB of RAM, we were there at the beginning, but obviously more features and complexity have been added since then.
@sebastienros I'll try to help as much as I can though it may be difficult to repro my situation. We built an eCommerce platform on top of OC and wrote many custom features. Because I'm having trouble figuring out the offending code it's hard for me to say if it's an issue in my code or OC. My tests do a lot of the typical eCommerce things (sign in, register, view product, add to cart, checkout, order history, etc.).
I wonder if something is sticking around after each ViewResult so it's simply the fact that my tests are making a lot of requests. That's why I tried a simpler test with a plain old OC site and the blog recipe, but it's hard to say if that really replicated the issue.
List of OC features we have enabled:
@rjpowers10 no tenants?
@sebastienros Oh sorry, we do have two tenants, one for humans and one for automated tests. All the tests are hitting the same tenant. The test are also running serially. The tenant for humans usually isn't doing much at this time though, since the tests run overnight.
@rjpowers10 Out of curiosity.. have you taken a look at your strings? I'm doing some analysis and found an oddity where an estimated 80% of strings that are 46 bytes long are the value "mvc.1.0.view" being repeated over and over. What's really painful is that these strings (or at least the ones I've been able to sample since gcroot is such an expensive operation) don't have a gcroot but are just sort of sitting around.
@rjpowers10 Out of curiosity.. have you taken a look at your strings? I'm doing some analysis and found an oddity where an estimated 80% of strings that are 46 bytes long are the value "mvc.1.0.view" being repeated over and over. What's really painful is that these strings (or at least the ones I've been able to sample since gcroot is such an expensive operation) don't have a gcroot but are just sort of sitting around.
There are a ton of strings in my memory dump, although only four instances of "mvc.1.0.view".
EDIT: Whoops, read the memory dump wrong. I have over 260,000 occurrences of "mvc.1.0.view".
This might be a total red herring, but we have experienced increasing memory usage which can be reproduced by the following steps in vanilla OC, so I'd like to share my results. The main idea behind this was to create an idle tenant shutdown handler that would "shut down" tenants/free up resources for tenants that are not used for a predefined time period.
Test controller for the repro:
using Microsoft.AspNetCore.Mvc;
using OrchardCore.Environment.Shell;
namespace OrchardCore.Cms.Web.Controllers;
[Route("test")]
public class TestController : Controller
{
private readonly IShellHost _shellHost;
private readonly ShellSettings _shellSettings;
private static readonly HttpClient _httpClient = new();
public TestController(
IShellHost shellHost,
ShellSettings shellSettings)
{
_shellHost = shellHost;
_shellSettings = shellSettings;
}
public async Task<string> Index()
{
_httpClient.BaseAddress ??= new Uri(HttpContext.Request.Scheme + "://" + HttpContext.Request.Host);
for (int i = 0; i < 1000; i++)
{
await _shellHost.ReleaseShellContextAsync(_shellSettings);
await _httpClient.GetAsync(_shellSettings.RequestUrlPrefix, HttpContext.RequestAborted);
}
return "OK";
}
}
Repro steps:
Releases a shell so that a new one will be built for subsequent requests. Note: Can be used to free up resources after a given time of inactivity.
So we want to do this exactly.Before and after the ReCaptcha Users feature memory snapshot results:
When you are in Development (there is an env variable) and/or in debug moder under Visual Studio, the GC may not work as expected, I already saw this kind of memory increase. Try to build in Release (maybe not necessary), at least start the app from the command line and then look at the Task Manager.
Then it may depend on your machine to decide when you are under memory pressure or not.
@jtkech is right, the debugger will keep some variables "rooted" so you can inspect them and hence prevent them from being collected.
Use a tool like PerfView or dotnetMemory profiler to take snapshots of the memory from a release run that you can then analyze in these tools.
I could repro the same result with release mode and "ASPNETCORE_ENVIRONMENT": "Production"
@wAsnk Can you repro in a deployed environment? You can use dotnet dump collect to get a memory dump and then look at it locally in VS.
@wAsnk
When I will have time I will re-install PerfView and look at it more in depth, for now my Visual Studio doesn't allow me to do 2 concecutive memory snapshots.
but in the meantime I could repro by just looking at the memory usage, when doing a lot of tenant release the GC works as expected, in my case it stabilizes around 700 Mb (but in debug mode). But yes after having enabled the ReCaptcha.Users
the memory increases much more with the same test.
I will look at this asap.
Did you see the same behavior but by enabling another feature?
@ShaneCourtrille and others
About the "mvc.1.0.view"
strings, under the debugger by using the Make object ID
command I could see that all our RazorViewCompiledItem
that we add (e.g. one for each Liquid file) are all using the same string instance.
But yes, all DefaultRazorCompiledItem
compiled at build time are all using a different instance, but I'm afraid that here we can't do anything.
@wAsnk
ReCaptchaLoginFilter
is activated on each request and resolves transitively ReCaptchaClient
for which a typed HttpClient has been registered. So on each action request a typed HttpClient is built.
Resolving ReCaptchaClient
only on demand in ReCaptchaService
fixed the issue.
Note: Maybe still a memory problem with typed HttpClient but less critical if only built as needed.
Did you see the same behavior but by enabling another feature?
No, I couldn't repro with every other feature on.
Resolving ReCaptchaClient only on demand in ReCaptchaService fixed the issue.
Sounds good!
Note: Maybe still a memory problem with typed HttpClient but less critical if only built as needed.
This reminds me of this PR: https://github.com/dotnet/corefx/pull/19082, might have some useful insights there.
We need to use IHttpClientFactory
in ReCaptchaClient
instead of injecting the http client. And we might be able to make ReCaptchaClient
a singleton (and ReacptchaService by resolving _robotDetectors lazily?)
IHttpClientFactory
is already used indirectly, a typed HttpClient
is registered meaning that each time ReCaptchaClient
is resolved an HttpClient
is build using the IHttpClientFactory
.
services.AddHttpClient<ReCaptchaClient>()
.AddTransientHttpErrorPolicy(policy =>
policy.WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(0.5 * attempt)));
So the problem is because of an mvc action filter that resolves on each request ReCaptchaService
, then ReCaptchaClient
which triggers an HttpClent
building by the IHttpClientFactory
.
To fix the issue I only needed in ReCaptchaService
to resolve lazily ReCaptchaClient
as needed, I already did the change locally, but didn't commit it yet because maybe some other updates may be worth to do, for example maybe better to make ReCaptchaService
a scoped service, I will check.
For now the only thing I needed to update is only one line, I will commit it to show what I did.
Here what I did for now #14335
So I read the documentation for IHttpClientFactory
and it says it's good for short live HttpClient instances. It means they should be disposed after each usage. That would mean that your suggestion might be better then, as long as we only instantiate it when we need the request, and it's disposed.
But using a single static instance share by all tenants would also work if we don't set the base url and pass it on the Post call instead.
Where they talk about disposing or not HttpClient
and then recommend to use IHttpClientFactory
for example because of this.
However, the issue isn't really with HttpClient per se, but with the default constructor for HttpClient, because it creates a new concrete instance of HttpMessageHandler, which is the one that has sockets exhaustion and DNS changes issues mentioned above.
And then for example this
Manage the lifetime of HttpMessageHandler to avoid the mentioned problems/issues that can occur when managing HttpClient lifetimes yourself.
About sharing things accross tenants the problem is that they use non thread safe collections, and we can configure different tenants at the same time, that's why we isolate httpclients registrations per tenants and from the host.
Finally in #14335 I kept the typed client pattern that also uses IHttpClientFactory
.
For the sake of brevity, this guidance shows the most structured way to use IHttpClientFactory, which is to use Typed Clients (Service Agent pattern).
@rjpowers10
I tried your simpler test
I next tried a much simpler example. This next example is a plain old OC site (latest from the main branch) using the blog recipe. That's it. ASPNETCORE_ENVIRONMENT is set to "Development". I then wrote a small PowerShell script to ping the site over and over.
Here I don't think that's an issue, I got the same under the Diagnostic Tools of Visual Studio, the memory increases until it stabilizes around 766 Mb in my case (issexpress.exe 674 Mb), around 550 Mb in your screen shot.
I will think about your original issue where you saw w3wp using over 8.5 GB.
@rjpowers10
I tried your simpler test
I next tried a much simpler example. This next example is a plain old OC site (latest from the main branch) using the blog recipe. That's it. ASPNETCORE_ENVIRONMENT is set to "Development". I then wrote a small PowerShell script to ping the site over and over.
Here I don't think that's an issue, I got the same under the Diagnostic Tools of Visual Studio, the memory increases until it stabilizes around 766 Mb in my case (issexpress.exe 674 Mb), around 550 Mb in your screen shot.
I will think about your original issue where you saw w3wp using over 8.5 GB.
It's very possible it is something I'm doing in my code. I do use HttpClient
a lot to communicate with a REST API, but I'm always going through IHttpClientFactory
to get the client, so I think that's OK.
Ah okay, thanks for the info.
Yes that's good to use IHttpClientFactory
but same issue if you create an http client for example on each request, even if you don't need it.
Are you using HttpClient handlers?
@sebastienros I am still trying to understand this as it does not make sense to me.
string interning is managed even across multiple dlls. So even when every view generate a new dll, it should not matter. Here is what I did, created two separate projects in the same solutions
In Project 1 I added the following 2 classes
public class ModuleAClassA
{
public string First { get; set; } = "First";
}
public class ModuleAClassB
{
public string First { get; set; } = "First";
}
In project 2, I added the following 2 classes
public class ModuleBClassA
{
public string First { get; set; } = "First";
}
public class ModuleBClassB
{
public string First { get; set; } = "First";
}
Then from project 1 I ran the following
Console.WriteLine(object.ReferenceEquals(aa.First, ab.First));
Console.WriteLine(object.ReferenceEquals(ba.First, bb.First));
Console.WriteLine(object.ReferenceEquals(ba.First, ab.First));
Console.WriteLine(object.ReferenceEquals(aa.First, bb.First));
Console.WriteLine(object.ReferenceEquals(aa.First, "First"));
All 5 lines printed True
which proves to me that string interning is managed across all dlls. Unless the environment makes a difference like running on Windows vs running on Lunix. Just thought I would share this just in case you have more ideas that you like to share.
Maybe I found a way to remove the "mvc.1.0.view"
duplicates.
Under the debugger you can see that now they refer to the same instance ID.
And no longer in the Duplicate Strings
.
Did you do that by using a constant for the string?
No for now I only used "mvc.1.0.view"
litteral strings that normally are interned in only one.
We only add few compiled items and one per liquid file, most of compiled items are built by aspnetcore by using data loaded from assembly attributes.
What I did for all descriptors is to change its item by a new one using this literal for its Kind
property.
Maybe for perf it's better to use a const, there is already one defined in aspnetcore, so that it doesn't have to lookup in the intern pool.
Are you using HttpClient handlers?
I have a few DelegatingHandler
s and a Polly handler for resiliency.
Ah okay, thanks for the info.
Yes that's good to use
IHttpClientFactory
but same issue if you create an http client for example on each request, even if you don't need it.
I understand that creating an HttpClient
when I don't need it is wasteful, but the GC should still be able to clean this up, right? I think my real problem isn't that memory is high but that memory is unable to be reclaimed. I did find that I have a similar situation as the ReCaptcha module in my code. I have a filter invoked on every request, and in the constructor we inject a typed client. That is wasteful since the filter doesn't do anything on most requests. So I can understand that making that lazy could help, but at first glance that doesn't seem to explain why w3wp is taking over 8GB of memory.
but that memory is unable to be reclaimed
If the GC can't collect it it's because it's rooted, this would be a bug in Orchard or its dependencies, and we need to find it.
but that memory is unable to be reclaimed
If the GC can't collect it it's because it's rooted, this would be a bug in Orchard or its dependencies, and we need to find it.
I'm no expert in this area but I'll try to share some things. These may just be red herrings.
Looking at the memory diff, whether sorted by count diff or size diff, string
is the top of the list.
May or may not be relevant: many of these strings are sitting in the large object heap.
Here's where I start getting confused. Just sampling one of the string instances, the path to root is super long. Maybe this is fine, just seemed odd to me. Maybe this can point someone in the right direction.
1 of 5
2 of 5
3 of 5
4 of 5
5 of 5
@rjpowers10
So I can understand that making that lazy could help, but at first glance that doesn't seem to explain why w3wp is taking over 8GB of memory.
Same answer as @sebastienros but now we may have something more specific to search on.
Maybe around HttpClientFactory
or HttpMessageHandlerBuilder
, I will look at it but need to install tools, for now my diagnostic tool often gets stuck, even if I use the last preview of Visual Studio ;)
IHttpClientFactory
does seem a little suspicious to me. Here's an int array. I'm a little surprised to see it rooted by DefaultHttpClientFactory
.
Okay, maybe I found something but not so obvious to re-use PerfView so not fully sure.
@rjpowers10 First it seems that it works as expected if we don't release the shell, to repro I need to release the shell/tenant (so rebuilt on a next request) many times, so more like what @wAsnk did.
In your UI Tests are you often releasing the tenant? For example it happens when saving Admin Settings.
That said looks like the DefaultHttpClientFactory
reference the service provider of the current container (our shell context) and has a static TimerCallback
variable with a delegate that references itself.
No problem in a regular app because DefaultHttpClientFactory
is a singleton and the host container is not intended to be rebuilt, but in our case it is a tenant singleton where the container can be rebuilt.
So I did a diff where we can see that.
I will investigate more and let you know.
Nice find! My UI tests are often making changes to admin settings and enabling / disabling features so we can test different things, so yes, we are indirectly releasing the tenant many times throughout the test run.
Okay cool so maybe we found the source of the problem.
We may need to override the IHttpClientFactory
, maybe not so easy because of internal classes, will see.
@rjpowers10 for info
Okay I could override DefaultHttpClientFactory
, made it IDisposable
to nullify some fields, for example _services
, for now it fixes the issue if we set the options .SuppressHandlerScope
to true, otherwise still some memory increase (but much less).
I also started to prevent the accumulation of timer callbacks that cleanup the handlers after their lifetime, but the lifetime can be higher than the period we use to release the tenant, but I made some progress.
Applying the above in #14348 seems to fix the issue.
So if you can redo your tests by using this branch, when you will have time.
Not related to this issue and HttpClient
,
But I noticed the same kind of memory increase by doing may times a tenant Reload that not only Releases the tenant but also Reloads and then Rebuilds the Configuration. So we may need to do the same kind of things for the tenant reloading (not only releasing). I will look at it when I will have time.
Describe the bug
My scenario: We have a site built on Orchard Core 1.6. We have a pipeline that periodically deploys an instance to a test server and runs a suite of UI tests against it (the tests are using MSTest and Playwright, if it makes any difference). The site is running in IIS. The test suite runs for about 1 hour, and by the end of that run the w3wp process is consuming over 8.5 GB.
ASPNETCORE_ENVIRONMENT
is set to "Development".Here's a quick look at the memory usage over time. This is the memory of the entire machine, not just OC, but looking in Task Manager, I see w3wp using over 8.5 GB.
To Reproduce
I've been struggling for quite a while on pinpointing the root issue, with no success thus far. I've tried using the Visual Studio memory profiler but I've been struggling to make any sense of it.
I next tried a much simpler example. This next example is a plain old OC site (latest from the main branch) using the blog recipe. That's it.
ASPNETCORE_ENVIRONMENT
is set to "Development". I then wrote a small PowerShell script to ping the site over and over.I ran this script for a little while. Looking at the diagnostic tools in Visual Studio, I can see multiple garbage collector invocations (the yellow arrows), but the memory never seems to decrease. I was hoping to see a more jagged pattern in the memory usage.
At this point I'm looking for any ideas or advice. Thanks.