OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.43k stars 2.4k forks source link

Struggling with high memory usage #14117

Closed rjpowers10 closed 1 year ago

rjpowers10 commented 1 year ago

Describe the bug

My scenario: We have a site built on Orchard Core 1.6. We have a pipeline that periodically deploys an instance to a test server and runs a suite of UI tests against it (the tests are using MSTest and Playwright, if it makes any difference). The site is running in IIS. The test suite runs for about 1 hour, and by the end of that run the w3wp process is consuming over 8.5 GB. ASPNETCORE_ENVIRONMENT is set to "Development".

Here's a quick look at the memory usage over time. This is the memory of the entire machine, not just OC, but looking in Task Manager, I see w3wp using over 8.5 GB.

image

To Reproduce

I've been struggling for quite a while on pinpointing the root issue, with no success thus far. I've tried using the Visual Studio memory profiler but I've been struggling to make any sense of it.

I next tried a much simpler example. This next example is a plain old OC site (latest from the main branch) using the blog recipe. That's it. ASPNETCORE_ENVIRONMENT is set to "Development". I then wrote a small PowerShell script to ping the site over and over.

function Invoke-Path($path) {
    Invoke-WebRequest -Uri "https://localhost:8001/${path}"
}

$Paths = @(
    '', 
    'categories', 
    'tags', 
    'Contents/ContentItems/40rg5j34w43er1a5dwg0j14cnr', 
    'login'
)

while (1) {
    $Paths | ForEach-Object -Process {
        Invoke-Path $_
    }
}

I ran this script for a little while. Looking at the diagnostic tools in Visual Studio, I can see multiple garbage collector invocations (the yellow arrows), but the memory never seems to decrease. I was hoping to see a more jagged pattern in the memory usage.

image

image

At this point I'm looking for any ideas or advice. Thanks.

MikeAlhayek commented 1 year ago

@rjpowers10 have you tried to disable cache from the general site settings? I am guessing there is a cache issue of maybe too many cached object in memory?

rjpowers10 commented 1 year ago

@MikeAlhayek thanks for the suggestion but in both my examples (my application built on OC and a plain old OC blog) the cache setting is set to "from environment". ASPNETCORE_ENVIRONMENT is set to 'Development' for both, so the dynamic cache should be disabled. I tried forcing the setting to 'Disabled' as a sanity check and the results are the same.

MikeAlhayek commented 1 year ago

Have you tried to disable all sites except for the default one? This may help trying to understand what is causing the high usage. If the default site has normal memory usage, try to enable one of your blog sites and see if you can pin point the site. Then maybe start disabling one feature at a time to see if you can pin point the problem to a single feature. This may give us more clues.

Also, check your log file for any silent errors that would be causing memory consumption.

Not sure if this will help, but something for now. Hopefully, @jtkech or @sebastienros have more ideas here.

sebastienros commented 1 year ago

How much memory is on the system in total. Take a memory snapshot between two jumps and check the diff. Run in Release, the fact that I see VS might mean it is a debug session.

rjpowers10 commented 1 year ago

How much memory is on the system in total. Take a memory snapshot between two jumps and check the diff. Run in Release, the fact that I see VS might mean it is a debug session.

My first example (my app built on OC) is a release build. My second example, which was a plain old OC site with the blog recipe, was in debug. The plain old OC site might just be a red herring, I just found it curious that memory didn't seem to go down after a GC collect. May not be relevant at all to my actual scenario.

The machine has 16GB of memory. SQL Server is also running which takes about 2GB itself. With my monitor I see the memory getting down to 0 and then as expected, the system starts paging and my tests start timing out.

I have a memory diff but I'm having trouble interpreting it. Wondering if anything jumps out to anyone.

Sorted by count diff image

Sorted by size diff image

Sorted by inclusive size diff image

rjpowers10 commented 1 year ago

Have you tried to disable all sites except for the default one? This may help trying to understand what is causing the high usage. If the default site has normal memory usage, try to enable one of your blog sites and see if you can pin point the site. Then maybe start disabling one feature at a time to see if you can pin point the problem to a single feature. This may give us more clues.

Also, check your log file for any silent errors that would be causing memory consumption.

Not sure if this will help, but something for now. Hopefully, @jtkech or @sebastienros have more ideas here.

I'm going to try limiting my test suite. Maybe there is one test in particular that exemplifies the problem.

sebastienros commented 1 year ago

RouteEndpoints in the diff would be explained by a tenant being loaded.

Coming back to your original description, the issue is that your test is taking 8.5Gb, can you try to do the same thing with ASPNETCORE_ENVIRONMENT in Production, not "Development", just to see if that has an impact.

Can you take a memory dump when the app is close to the end to see what is still kept in memory?

rjpowers10 commented 1 year ago

RouteEndpoints in the diff would be explained by a tenant being loaded.

Coming back to your original description, the issue is that your test is taking 8.5Gb, can you try to do the same thing with ASPNETCORE_ENVIRONMENT in Production, not "Development", just to see if that has an impact.

Can you take a memory dump when the app is close to the end to see what is still kept in memory?

I ran the test suite again, this time with ASPNETCORE_ENVIRONMENT set to 'Production'. It didn't seem to make a difference. The tests finished in 1 hour, 15 minutes. This is what it looked like at the end of the test run.

image

This is the memory dump at the conclusion of the run.

Sorted by count image

Sorted by size image

Sorted by inclusive size image

ShaneCourtrille commented 1 year ago

Our memory issues are consistent with this as well. These are our worst offenders in terms of memory usage from our latest performance testing while trying (and failing miserably) to run with 1000 tenants.

000071c5d1578080   449841     25195716 System.Int32[]
000071c5d15de7f0   101131     25791704 System.Byte[]
000071c5daff7f50   585044     28082112 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Type, System.Private.CoreLib],[System.Object[], System.Private.CoreLib]]
000071c5d7d8fbd0   728944     29157760 Microsoft.AspNetCore.Routing.RouteValueDictionary
000071c5d56e17c0   572454     32057424 System.Collections.Generic.Dictionary`2+Enumerator[[System.String, System.Private.CoreLib],[System.String, System.Private.CoreLib]]
000071c5daff8000   138733     37735376 System.Collections.Concurrent.ConcurrentDictionary`2+Node[[System.Type, System.Private.CoreLib],[System.Object[], System.Private.CoreLib]][]
000071c5d14ab110   499016     45688856 System.Object[]
000071c5d1e0d050    31347     47360136 System.Reflection.CustomAttributeRecord[]
000071c5d3fc6db8   963130     78553616 System.Collections.Generic.KeyValuePair`2[[System.String, System.Private.CoreLib],[System.Object, System.Private.CoreLib]][]
000071c5d157d2e0  1208078    108278276 System.String
Total 16274388 objects
sebastienros commented 1 year ago

Time to do another pass at it!

I will try with my own "script" but I'd like to be able to repro something that matches your experience. @rjpowers10 how many tenants are you loading and what features do they use? Ideally if you could do the same experiment and confirm it with the smallest possible list of features that would help. @ShaneCourtrille thanks for the confirmation. I'd love to achieve a density of 1000 on 8GB of RAM, we were there at the beginning, but obviously more features and complexity have been added since then.

rjpowers10 commented 1 year ago

@sebastienros I'll try to help as much as I can though it may be difficult to repro my situation. We built an eCommerce platform on top of OC and wrote many custom features. Because I'm having trouble figuring out the offending code it's hard for me to say if it's an issue in my code or OC. My tests do a lot of the typical eCommerce things (sign in, register, view product, add to cart, checkout, order history, etc.).

I wonder if something is sticking around after each ViewResult so it's simply the fact that my tests are making a lot of requests. That's why I tried a simpler test with a plain old OC site and the blog recipe, but it's hard to say if that really replicated the issue.

List of OC features we have enabled:

  1. OrchardCore.Admin
  2. OrchardCore.Alias
  3. OrchardCore.Autoroute
  4. OrchardCore.BackgroundTasks
  5. OrchardCore.ContentFields
  6. OrchardCore.ContentPreview
  7. OrchardCore.Contents
  8. OrchardCore.ContentTypes
  9. OrchardCore.CustomSettings
  10. OrchardCore.Deployment
  11. OrchardCore.Diagnostics
  12. OrchardCore.DynamicCache
  13. OrchardCore.Email
  14. OrchardCore.Features
  15. OrchardCore.Flows
  16. OrchardCore.Forms
  17. OrchardCore.Google.Analytics
  18. OrchardCore.HomeRoute
  19. OrchardCore.Html
  20. OrchardCore.Https
  21. OrchardCore.Indexing
  22. OrchardCore.Layers
  23. OrchardCore.Liquid
  24. OrchardCore.Lists
  25. OrchardCore.Localization
  26. OrchardCore.Markdown
  27. OrchardCore.Media
  28. OrchardCore.Media.Slugify
  29. OrchardCore.Menu
  30. OrchardCore.Navigation
  31. OrchardCore.Placements
  32. OrchardCore.Recipes
  33. OrchardCore.Resources
  34. OrchardCore.Roles
  35. OrchardCore.Rules
  36. OrchardCore.Scripting
  37. OrchardCore.Search.Lucene
  38. OrchardCore.Search.Lucene.Worker
  39. OrchardCore.Settings
  40. OrchardCore.Shortcodes
  41. OrchardCore.Taxonomies
  42. OrchardCore.Themes
  43. OrchardCore.Title
  44. OrchardCore.Users
  45. OrchardCore.Users.ChangeEmail
  46. OrchardCore.Users.Registration
  47. OrchardCore.Users.ResetPassword
  48. OrchardCore.Widgets
  49. OrchardCore.Workflows
  50. TheAdmin
sebastienros commented 1 year ago

@rjpowers10 no tenants?

rjpowers10 commented 1 year ago

@sebastienros Oh sorry, we do have two tenants, one for humans and one for automated tests. All the tests are hitting the same tenant. The test are also running serially. The tenant for humans usually isn't doing much at this time though, since the tests run overnight.

ShaneCourtrille commented 1 year ago

@rjpowers10 Out of curiosity.. have you taken a look at your strings? I'm doing some analysis and found an oddity where an estimated 80% of strings that are 46 bytes long are the value "mvc.1.0.view" being repeated over and over. What's really painful is that these strings (or at least the ones I've been able to sample since gcroot is such an expensive operation) don't have a gcroot but are just sort of sitting around.

rjpowers10 commented 1 year ago

@rjpowers10 Out of curiosity.. have you taken a look at your strings? I'm doing some analysis and found an oddity where an estimated 80% of strings that are 46 bytes long are the value "mvc.1.0.view" being repeated over and over. What's really painful is that these strings (or at least the ones I've been able to sample since gcroot is such an expensive operation) don't have a gcroot but are just sort of sitting around.

There are a ton of strings in my memory dump, although only four instances of "mvc.1.0.view".

EDIT: Whoops, read the memory dump wrong. I have over 260,000 occurrences of "mvc.1.0.view".

wAsnk commented 1 year ago

This might be a total red herring, but we have experienced increasing memory usage which can be reproduced by the following steps in vanilla OC, so I'd like to share my results. The main idea behind this was to create an idle tenant shutdown handler that would "shut down" tenants/free up resources for tenants that are not used for a predefined time period.

Test controller for the repro:

using Microsoft.AspNetCore.Mvc;
using OrchardCore.Environment.Shell;

namespace OrchardCore.Cms.Web.Controllers;

[Route("test")]
public class TestController : Controller
{
    private readonly IShellHost _shellHost;
    private readonly ShellSettings _shellSettings;

    private static readonly HttpClient _httpClient = new();

    public TestController(
        IShellHost shellHost,
        ShellSettings shellSettings)
    {
        _shellHost = shellHost;
        _shellSettings = shellSettings;
    }

    public async Task<string> Index()
    {
        _httpClient.BaseAddress ??= new Uri(HttpContext.Request.Scheme + "://" + HttpContext.Request.Host);

        for (int i = 0; i < 1000; i++)
        {
            await _shellHost.ReleaseShellContextAsync(_shellSettings);

            await _httpClient.GetAsync(_shellSettings.RequestUrlPrefix, HttpContext.RequestAborted);
        }

        return "OK";
    }
}

Repro steps:

  1. Use the OC repository.
  2. Setup default tenant with Software as service recipe
  3. Add a controller like the one above which uses the IShellHost.ReleaseShellContextAsync function which summary says: Releases a shell so that a new one will be built for subsequent requests. Note: Can be used to free up resources after a given time of inactivity. So we want to do this exactly.
  4. Call /test URL, notice that after a while GC kicks in and the memory usage will maximize around ~500MB and never go above. image
  5. Now enable ReCaptcha Users feature.
  6. Call /test, notice how the memory usage goes up until the call is aborted (if not aborted it will use all memory), GC can't control the memory usage even after 10 minutes: image

Before and after the ReCaptcha Users feature memory snapshot results:

  1. Count Diff image
  2. Size Diff image
  3. Inclusive Size Diff image
  4. Insights image
jtkech commented 1 year ago

When you are in Development (there is an env variable) and/or in debug moder under Visual Studio, the GC may not work as expected, I already saw this kind of memory increase. Try to build in Release (maybe not necessary), at least start the app from the command line and then look at the Task Manager.

Then it may depend on your machine to decide when you are under memory pressure or not.

sebastienros commented 1 year ago

@jtkech is right, the debugger will keep some variables "rooted" so you can inspect them and hence prevent them from being collected.

Use a tool like PerfView or dotnetMemory profiler to take snapshots of the memory from a release run that you can then analyze in these tools.

wAsnk commented 1 year ago

I could repro the same result with release mode and "ASPNETCORE_ENVIRONMENT": "Production"

ShaneCourtrille commented 1 year ago

@wAsnk Can you repro in a deployed environment? You can use dotnet dump collect to get a memory dump and then look at it locally in VS.

jtkech commented 1 year ago

@wAsnk

When I will have time I will re-install PerfView and look at it more in depth, for now my Visual Studio doesn't allow me to do 2 concecutive memory snapshots.

but in the meantime I could repro by just looking at the memory usage, when doing a lot of tenant release the GC works as expected, in my case it stabilizes around 700 Mb (but in debug mode). But yes after having enabled the ReCaptcha.Users the memory increases much more with the same test.

I will look at this asap.

Did you see the same behavior but by enabling another feature?


@ShaneCourtrille and others

About the "mvc.1.0.view" strings, under the debugger by using the Make object ID command I could see that all our RazorViewCompiledItem that we add (e.g. one for each Liquid file) are all using the same string instance.

But yes, all DefaultRazorCompiledItem compiled at build time are all using a different instance, but I'm afraid that here we can't do anything.

jtkech commented 1 year ago

@wAsnk

ReCaptchaLoginFilter is activated on each request and resolves transitively ReCaptchaClient for which a typed HttpClient has been registered. So on each action request a typed HttpClient is built.

Resolving ReCaptchaClient only on demand in ReCaptchaService fixed the issue.

Note: Maybe still a memory problem with typed HttpClient but less critical if only built as needed.

wAsnk commented 1 year ago

Did you see the same behavior but by enabling another feature?

No, I couldn't repro with every other feature on.

Resolving ReCaptchaClient only on demand in ReCaptchaService fixed the issue.

Sounds good!

BenedekFarkas commented 1 year ago

Note: Maybe still a memory problem with typed HttpClient but less critical if only built as needed.

This reminds me of this PR: https://github.com/dotnet/corefx/pull/19082, might have some useful insights there.

sebastienros commented 1 year ago

We need to use IHttpClientFactory in ReCaptchaClient instead of injecting the http client. And we might be able to make ReCaptchaClient a singleton (and ReacptchaService by resolving _robotDetectors lazily?)

jtkech commented 1 year ago

IHttpClientFactory is already used indirectly, a typed HttpClient is registered meaning that each time ReCaptchaClient is resolved an HttpClient is build using the IHttpClientFactory.

https://github.com/dotnet/runtime/blob/1ff7fbb4c61f08d36afe0e113a676a0fb332ffbc/src/libraries/Microsoft.Extensions.Http/src/DependencyInjection/HttpClientBuilderExtensions.cs#L545C17-L545C35

services.AddHttpClient<ReCaptchaClient>()
    .AddTransientHttpErrorPolicy(policy =>
        policy.WaitAndRetryAsync(3, attempt => TimeSpan.FromSeconds(0.5 * attempt)));

So the problem is because of an mvc action filter that resolves on each request ReCaptchaService, then ReCaptchaClient which triggers an HttpClent building by the IHttpClientFactory.

To fix the issue I only needed in ReCaptchaService to resolve lazily ReCaptchaClient as needed, I already did the change locally, but didn't commit it yet because maybe some other updates may be worth to do, for example maybe better to make ReCaptchaService a scoped service, I will check.

For now the only thing I needed to update is only one line, I will commit it to show what I did.

jtkech commented 1 year ago

Here what I did for now #14335

sebastienros commented 1 year ago

So I read the documentation for IHttpClientFactory and it says it's good for short live HttpClient instances. It means they should be disposed after each usage. That would mean that your suggestion might be better then, as long as we only instantiate it when we need the request, and it's disposed.

But using a single static instance share by all tenants would also work if we don't set the base url and pass it on the Post call instead.

jtkech commented 1 year ago

Looking at https://learn.microsoft.com/en-us/dotnet/architecture/microservices/implement-resilient-applications/use-httpclientfactory-to-implement-resilient-http-requests

Where they talk about disposing or not HttpClient and then recommend to use IHttpClientFactory for example because of this.

However, the issue isn't really with HttpClient per se, but with the default constructor for HttpClient, because it creates a new concrete instance of HttpMessageHandler, which is the one that has sockets exhaustion and DNS changes issues mentioned above.

And then for example this

Manage the lifetime of HttpMessageHandler to avoid the mentioned problems/issues that can occur when managing HttpClient lifetimes yourself.

About sharing things accross tenants the problem is that they use non thread safe collections, and we can configure different tenants at the same time, that's why we isolate httpclients registrations per tenants and from the host.

jtkech commented 1 year ago

Finally in #14335 I kept the typed client pattern that also uses IHttpClientFactory.

For the sake of brevity, this guidance shows the most structured way to use IHttpClientFactory, which is to use Typed Clients (Service Agent pattern).

jtkech commented 1 year ago

@rjpowers10

I tried your simpler test

I next tried a much simpler example. This next example is a plain old OC site (latest from the main branch) using the blog recipe. That's it. ASPNETCORE_ENVIRONMENT is set to "Development". I then wrote a small PowerShell script to ping the site over and over.

Here I don't think that's an issue, I got the same under the Diagnostic Tools of Visual Studio, the memory increases until it stabilizes around 766 Mb in my case (issexpress.exe 674 Mb), around 550 Mb in your screen shot.

I will think about your original issue where you saw w3wp using over 8.5 GB.

rjpowers10 commented 1 year ago

@rjpowers10

I tried your simpler test

I next tried a much simpler example. This next example is a plain old OC site (latest from the main branch) using the blog recipe. That's it. ASPNETCORE_ENVIRONMENT is set to "Development". I then wrote a small PowerShell script to ping the site over and over.

Here I don't think that's an issue, I got the same under the Diagnostic Tools of Visual Studio, the memory increases until it stabilizes around 766 Mb in my case (issexpress.exe 674 Mb), around 550 Mb in your screen shot.

I will think about your original issue where you saw w3wp using over 8.5 GB.

It's very possible it is something I'm doing in my code. I do use HttpClient a lot to communicate with a REST API, but I'm always going through IHttpClientFactory to get the client, so I think that's OK.

jtkech commented 1 year ago

Ah okay, thanks for the info.

Yes that's good to use IHttpClientFactory but same issue if you create an http client for example on each request, even if you don't need it.

jtkech commented 1 year ago

Are you using HttpClient handlers?

MikeAlhayek commented 1 year ago

@sebastienros I am still trying to understand this as it does not make sense to me.

image

string interning is managed even across multiple dlls. So even when every view generate a new dll, it should not matter. Here is what I did, created two separate projects in the same solutions

In Project 1 I added the following 2 classes

public class ModuleAClassA
{
    public string First { get; set; } = "First";
}

public class ModuleAClassB
{
    public string First { get; set; } = "First";
}

In project 2, I added the following 2 classes

public class ModuleBClassA
{
    public string First { get; set; } = "First";
}

public class ModuleBClassB
{
    public string First { get; set; } = "First";
}

Then from project 1 I ran the following

Console.WriteLine(object.ReferenceEquals(aa.First, ab.First));
Console.WriteLine(object.ReferenceEquals(ba.First, bb.First));
Console.WriteLine(object.ReferenceEquals(ba.First, ab.First));
Console.WriteLine(object.ReferenceEquals(aa.First, bb.First));
Console.WriteLine(object.ReferenceEquals(aa.First, "First"));

All 5 lines printed True which proves to me that string interning is managed across all dlls. Unless the environment makes a difference like running on Windows vs running on Lunix. Just thought I would share this just in case you have more ideas that you like to share.

jtkech commented 1 year ago

Maybe I found a way to remove the "mvc.1.0.view" duplicates.

Under the debugger you can see that now they refer to the same instance ID.

Razor Strings Debug

And no longer in the Duplicate Strings.

Razor Strings Memory

MikeAlhayek commented 1 year ago

Did you do that by using a constant for the string?

jtkech commented 1 year ago

No for now I only used "mvc.1.0.view" litteral strings that normally are interned in only one.

We only add few compiled items and one per liquid file, most of compiled items are built by aspnetcore by using data loaded from assembly attributes.

What I did for all descriptors is to change its item by a new one using this literal for its Kind property.

Maybe for perf it's better to use a const, there is already one defined in aspnetcore, so that it doesn't have to lookup in the intern pool.

rjpowers10 commented 1 year ago

Are you using HttpClient handlers?

I have a few DelegatingHandlers and a Polly handler for resiliency.

rjpowers10 commented 1 year ago

Ah okay, thanks for the info.

Yes that's good to use IHttpClientFactory but same issue if you create an http client for example on each request, even if you don't need it.

I understand that creating an HttpClient when I don't need it is wasteful, but the GC should still be able to clean this up, right? I think my real problem isn't that memory is high but that memory is unable to be reclaimed. I did find that I have a similar situation as the ReCaptcha module in my code. I have a filter invoked on every request, and in the constructor we inject a typed client. That is wasteful since the filter doesn't do anything on most requests. So I can understand that making that lazy could help, but at first glance that doesn't seem to explain why w3wp is taking over 8GB of memory.

sebastienros commented 1 year ago

but that memory is unable to be reclaimed

If the GC can't collect it it's because it's rooted, this would be a bug in Orchard or its dependencies, and we need to find it.

rjpowers10 commented 1 year ago

but that memory is unable to be reclaimed

If the GC can't collect it it's because it's rooted, this would be a bug in Orchard or its dependencies, and we need to find it.

I'm no expert in this area but I'll try to share some things. These may just be red herrings.

Looking at the memory diff, whether sorted by count diff or size diff, string is the top of the list. image

May or may not be relevant: many of these strings are sitting in the large object heap. image

Here's where I start getting confused. Just sampling one of the string instances, the path to root is super long. Maybe this is fine, just seemed odd to me. Maybe this can point someone in the right direction.

1 of 5 image

2 of 5 image

3 of 5 image

4 of 5 image

5 of 5 image

jtkech commented 1 year ago

@rjpowers10

So I can understand that making that lazy could help, but at first glance that doesn't seem to explain why w3wp is taking over 8GB of memory.

Same answer as @sebastienros but now we may have something more specific to search on.

Maybe around HttpClientFactory or HttpMessageHandlerBuilder, I will look at it but need to install tools, for now my diagnostic tool often gets stuck, even if I use the last preview of Visual Studio ;)

rjpowers10 commented 1 year ago

IHttpClientFactory does seem a little suspicious to me. Here's an int array. I'm a little surprised to see it rooted by DefaultHttpClientFactory.

image

jtkech commented 1 year ago

https://github.com/dotnet/runtime/blob/048978abe4cd48222547f3fb8062667cf357904b/src/libraries/Microsoft.Extensions.Http/src/DefaultHttpClientFactory.cs#L21C1-L22C1

jtkech commented 1 year ago

Okay, maybe I found something but not so obvious to re-use PerfView so not fully sure.

@rjpowers10 First it seems that it works as expected if we don't release the shell, to repro I need to release the shell/tenant (so rebuilt on a next request) many times, so more like what @wAsnk did.

In your UI Tests are you often releasing the tenant? For example it happens when saving Admin Settings.


That said looks like the DefaultHttpClientFactory reference the service provider of the current container (our shell context) and has a static TimerCallback variable with a delegate that references itself.

No problem in a regular app because DefaultHttpClientFactory is a singleton and the host container is not intended to be rebuilt, but in our case it is a tenant singleton where the container can be rebuilt.

So I did a diff where we can see that.

HttpClientFactory

I will investigate more and let you know.

rjpowers10 commented 1 year ago

Nice find! My UI tests are often making changes to admin settings and enabling / disabling features so we can test different things, so yes, we are indirectly releasing the tenant many times throughout the test run.

jtkech commented 1 year ago

Okay cool so maybe we found the source of the problem.

We may need to override the IHttpClientFactory, maybe not so easy because of internal classes, will see.

jtkech commented 1 year ago

@rjpowers10 for info

Okay I could override DefaultHttpClientFactory, made it IDisposable to nullify some fields, for example _services, for now it fixes the issue if we set the options .SuppressHandlerScope to true, otherwise still some memory increase (but much less).

I also started to prevent the accumulation of timer callbacks that cleanup the handlers after their lifetime, but the lifetime can be higher than the period we use to release the tenant, but I made some progress.

jtkech commented 1 year ago

Applying the above in #14348 seems to fix the issue.

So if you can redo your tests by using this branch, when you will have time.


Not related to this issue and HttpClient,

But I noticed the same kind of memory increase by doing may times a tenant Reload that not only Releases the tenant but also Reloads and then Rebuilds the Configuration. So we may need to do the same kind of things for the tenant reloading (not only releasing). I will look at it when I will have time.