dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.92k stars 4.64k forks source link

Reduce CoreCLR memory footprint #8718

Open kvochko opened 7 years ago

kvochko commented 7 years ago

We are currently working on minimizing memory footprint of CoreCLR. Our common scenario is a mobile Xamarin.Forms application for Tizen, test cases can be found here. One of the areas we are looking at is the garbage collector. We have tried several things and found that, for example, setting g_bLowMemoryFromHost to 1 and ephemeral budget to a fixed small value reduces the peak private dirty memory by up to ~1200K.


Application         | Original peak      | Modified peak      | Difference
                    | Private_Dirty, Kb  | Private_Dirty, Kb  |

HelloWorld          | 14744              | 13588              | 1156
ApplicationStoreUI  | 14816              | 14668              | 148
GalleryUI           | 46456              | 46264              | 192
SNSUI               | 21872              | 21724              | 148
Calculator          | 17944              | 17588              | 356

Application         | Original startup   | Modified startup   | Difference
                    | CPU time (jiffies) | CPU time (jiffies) |

HelloWorld          | 135                | 146                | 11
ApplicationStoreUI  | 192                | 192                | 0
GalleryUI           | 251                | 259                | 8
SNSUI               | 447                | 452                | 5
Calculator          | 290                | 295                | 5

As discussed previously, exposing knobs like these as runtime configuration options is not desirable. What would be the best way to implement such changes? There seems to be a concept of performance scenarios, but it's not clear if this code is used anywhere, and if so - how to use it, and it's still a question whether it is the right place to implement such changes. Any help is greatly appreciated.

@Maoni0 @swgillespie CC @ruben-ayrapetyan @gbalykov @Dmitri-Botcharnikov

ruben-ayrapetyan commented 7 years ago

Related issue: dotnet/runtime#7694

Maoni0 commented 7 years ago

@kvochko thanks for the data. I have the following observations:

1) All your non-Hello World scenarios showed at best 2% improvement for peak private dirty memory. The Helloworld one showed 8% improvement. Since the changes only affect when GC runs, it means this Helloworld scenario actually triggered at least one GC. I presume this is a UI Helloworld in Xamarin.Forms.

2) Almost all your scenarios showed regression in CPU usage (I guess this "jiffies" is some sort of time unit that's equivalent to a certain # of ms specific to your system) and the degree of regression is larger than that of peak private dirty memory.

If we only have these 2 data points it's hard to say if this change is worth doing at all. Our perf investigation doesn't stop at this point though - usually the next step is to figure out where the perf improvement/regression comes from and see how that can show up in real world scenarios. Have you looked at Helloworld more closely to see why it showed a much bigger change than other scenarios?

kvochko commented 7 years ago

@Maoni0 while it is true that the 2% improvement is relatively small, we are continuing to tune the garbage collector for our scenario, and it's possible that there will be other changes like these. Right now we would just like to know in what form these changes can be submitted to the mainline when improvements are good enough.

Maoni0 commented 7 years ago

This is how I envision how the changes would be reflected - we have a few different levels that you can use to say how much you care about each aspect - memory footprint, throughput or latency. Level 1 through 4 look like the following:

level optimization goals latency characteristics
1 memory footprint pauses can be long and more frequent
2 throughput pauses are unpredictable and not very frequent, and might be long
3 a balance between pauses and throughput pauses are more predictable and more frequent. The longest pauses are shorter than 2
4 short pauses pauses are more predictable and more frequent. The longest pauses are shorter than 3

You can specify the level you want via an application config. So for example for your scenario if memory footprint is what you care about the most you can specify level 1.

kvochko commented 7 years ago

@Maoni0 Thanks! I've measured private dirty memory used only by the managed heap, and it seems that the 1156Kb gain in HelloWorld is coming from elsewhere.

Here are the latest memory gains for all apps (peak private dirty for GC only), with the following changes relative to the master:

Application         | Original peak      | Modified peak      | Difference
                    | Private_Dirty, Kb  | Private_Dirty, Kb  |

HelloWorld          | 364                | 216                | 148
Puzzle              | 684                | 468                | 216
ApplicationStoreUI  | 628                | 372                | 256
GalleryUI           | 516                | 336                | 180
AppCommon           | 660                | 516                | 144
SNSUI               | 2156               | 1624               | 532
Calculator          | 1024               | 660                | 364

Performance degradation:

Application         | Original startup   | Modified startup   | Difference
                    | CPU time (jiffies) | CPU time (jiffies) |

HelloWorld          | 135                | 143                | 8
ApplicationStoreUI  | 192                | 219                | 27
GalleryUI           | 251                | 251                | 0
SNSUI               | 447                | 458                | 11
Calculator          | 290                | 299                | 9
Maoni0 commented 7 years ago

@kvochko, thanks for the data. @swgillespie will be working with you on this. We need to do the following:

  1. Set different sets of tuning parameters according to latency levels - things like g_bLowMemoryFromHost might be removed completely on core as there's no hosting;

  2. Run a lot more tests with more data collected so we understand the perf characteristics better - unfortunately on Linux it's a lot harder to get perf data so we can start by running perf tests on windows;

We also need to clearly document what the goal for each latency level is - for example, for the memory footprint level I think we'd definitely care about steady state perf too, not just startup.

ruben-ayrapetyan commented 7 years ago

CC @lemmaa @egavrin

gbalykov commented 7 years ago

We have continued our research and got the next results.

Results for the forced compacting (gcForceCompact) and LOH compacting (GCLOHCompact) enabled (measured on a set of Tizen Xamarin GUI applications).

Private_Dirty memory used only by the managed heap:

Application Original, Peak, Private_Dirty, Kb Modified, Peak, Private_Dirty, Kb Difference
HelloWorld 300 284 16 (5.33%)
BoxViewClock 1388 1284 104 (7.49%)
Puzzle 940 780 160 (17.02%)
ApplicationStoreUI 668 548 120 (17.96%)
GalleryUI 564 444 120 (21.28%)
AppCommon 708 612 96 (13.56%)
SNSUI 2428 1715 712 (29.32%)
Calculator 1064 1008 56 (5.26%)

Average reduction of GC heap - 14.65%

Startup time difference is:

Application Original, Application startup real (wall-clock) time, seconds Modified, Application startup real (wall-clock) time, seconds Difference
HelloWorld 0.94 0.97 0.03 (3.19%)
BoxViewClock 1.11 1.12 0.01 (0.90%)
Puzzle 1.36 1.36 0
ApplicationStoreUI 1.42 1.42 0
GalleryUI 1.3 1.31 0.01 (0.77%)
AppCommon 1.72 1.73 0.01 (0.58%)
SNSUI 3.96 4.01 0.05 (1.25%)
Calculator 2.34 2.34 0

Average increase of startup time - 0.83%

Why are gcForceCompact and GCLOHCompact options marked as UNSUPPORTED? Are they planned to be removed in future?

Currently we are concentrated on memory consumption for Tizen and we could add two GC optimization levels: "default", which would be the same as now, and "memory footprint", which we will tune according to our investigations on Tizen.

What do you think about the results?

gbalykov commented 7 years ago

We have continued our research further and got the next results for the next GC options (measured on a set of Tizen Xamarin GUI applications):

Private_Dirty memory used only by the managed heap:

Application Original, Peak, Private_Dirty, Kb Modified, Peak, Private_Dirty, Kb Difference
HelloWorld 372 292 80 (21.51%)
BoxViewClock 1372 744 628 (45.77%)
Puzzle 868 572 296 (34.10%)
ApplicationStoreUI 636 420 216 (33.96%)
GalleryUI 532 360 172 (32.33%)
AppCommon 700 552 148 (21.14%)
SNSUI 2172 1624 548 (25.23%)
Calculator 1032 696 336 (32.56%)

Average reduction of GC heap - 30.83%

Startup time difference (startup time measurement was done with the new start point):

Application Original, Application startup real (wall-clock) time, seconds Modified, Application startup real (wall-clock) time, seconds Difference
HelloWorld 1.6095 1.6185 0.0090 (0.56%)
BoxViewClock 1.7677 1.7777 0.0100 (0.57%)
Puzzle 2 2 0
ApplicationStoreUI 2.0620 2.0771 0.0151 (0.73%)
GalleryUI 1.9640 1.9699 0.0059 (0.30%)
AppCommon 2.3904 2.4123 0.0219 (0.92%)
SNSUI 4.6215 4.7529 0.1314 (2.84%)
Calculator 2.9938 3.0470 0.0532 (1.78%)

Average increase of startup time - 0.96%

@Maoni0 @swgillespie Is it possible to add "memory footprint" option with this configuration?

swgillespie commented 7 years ago

@gbalykov level 1 of Maoni's proposed GC configuration would be optimizing for memory footprint, if that's what you're asking. I started a branch the other day with some work towards this feature: https://github.com/swgillespie/coreclr/tree/feature-gc-latency-levels. (very bare bones)

Setting LOH compaction + force compaction is potentially dangerous for steady-state perf - we'll need to do a lot of performance testing and tuning to arrive at the correct internal "knobs" that we need to be turning to get better memory footprints without compromising too heavily on the other aspects of runtime perf.

At the moment, our infrastructure for getting these sorts of questions answered (running perf tests and collecting data) is lacking a little bit. I'm working on making this better as I write this so I'm looking forward to be able to use it to test this feature.

Maoni0 commented 7 years ago

@gbalykov would you please collect some ETW events for your perf runs? it would be really helpful to actually see the GC characteristics. The instructions are here.

gbalykov commented 7 years ago

@Maoni0 @swgillespie We agree that general case "memory footprint" GC optimization level should be tested and tuned on the wide spectrum of applications. Our current goal is more specific - memory consumption on Tizen GUI applications, so we would like to be able to specify GC configuration, which was tuned for this specific scenario. How could this be implemented in CoreCLR?

SNSUI.trace.zip is the result of the ./perfcollect collect SNSUI -gconly for SNSUI Tizen Xamarin application. We were able to open result with Trace Compass tool.

Also here are the results for two GC heavy GUI applications, which we have tested:

Private_Dirty memory used only by the managed heap:

Application Original, Peak, Private_Dirty, Kb Modified, Peak, Private_Dirty, Kb Difference
org.tizen.example.gc1.Tizen 4188 4044 144 (3.44%)
org.tizen.example.gc2.Tizen 237104 125844 111260 (46.92%)

Startup time difference:

Application Original, Application startup real (wall-clock) time, seconds Modified, Application startup real (wall-clock) time, seconds Difference
org.tizen.example.gc1.Tizen 7.236 8.639 1.403 (19.39%)
org.tizen.example.gc2.Tizen 75.304 87.563 12.259 (16.28%)

Currently we consider this results as possibly fully satisfying for Tizen GUI profile.

gbalykov commented 7 years ago

We have refined the way of memory consumption measurement for managed heap and the results were updated https://github.com/dotnet/coreclr/issues/13292#issuecomment-324055884

Maoni0 commented 7 years ago

@brianrob is what perfcollect collected supposed to be viewable with PerfView? I just opened the trace @gbalykov mentioned in this comment and PerfView just gives me some incomplete events with totally bogus info, eg: there's GCStart and no GCEnd events. for a GCStart it gives me this as its fields:

Process(1981834595) (1981834595) ThreadID="543,451,503" Count="828,400,494" Reason="41" Depth="1,043,341,628" Type="2031616" ClrInstanceID="0" ClientSequenceNumber="154,618,822,656" |

I'll take a look at the PR.

brianrob commented 7 years ago

@Maoni0, I'm not sure what's up and why PerfView doesn't like this data file. I will look, but it will take me some time to get to this.

In the meantime, you may have better luck opening the trace on Linux and looking at the individual events. You can do so by unzipping the file and then running babeltrace SNSUI.trace/lttngTrace | grep GC | more. The results when opening on Linux look more sane to me.

Maoni0 commented 7 years ago

thanks @brianrob

Maoni0 commented 7 years ago

@gbalykov we definitely don't want call something "unstable perf" which is not explainable. Could you please explain your perf goal here? The startup throughput perf does take a significant hit (this is not unstable - you are just aiming for a different perf goal). Also do you not care about your steady state at all? 'cause I haven't seen any steady state data. I can see some of these apps naturally only have a start up phase like calculator. what about puzzle or galleryUI? I would think users would want to keep these running for a while and switch between different apps during that process.

gbalykov commented 7 years ago

@Maoni0 Our perf goal is to not slow down Tizen Xamarin GUI applications noticeably.

Among .NET GUI applications, which could be launched on Tizen or Linux, we know only Tizen Xamarin GUI applications. They do almost all of the work during startup, that's why we measured startup time. GC GUI benchmarks (which we used as emulation of GC heavy applications) also do all of the work during startup, i.e. startup time is the total execution time of GC benchmark. We assume that behaviour in steady state will be between the simple startup-only GUI applications and GC heavy emulation applications.

So, unfortunately, we do not have actual GUI applications, which do significant work after the startup, and so could be used to measure steady state. Could you, please, suggest a Linux benchmark or application to measure performance of steady state on it?

gbalykov commented 7 years ago

@Maoni0 As according to our assumption the behaviour in steady state will be between the simple startup-only GUI applications and GC heavy emulation applications, we have removed _unstable_perf suffix, now it is just latency_level_small_memory_footprint. What do you think about that?

Straightforward measurement of steady state certainly is more desirable, and we will do that when the appropriate application appears.

Maoni0 commented 7 years ago

@gbalykov I must apologize that I haven't been very responsive as I really just haven't had much time to think about this. I am very cautious because this is part of the public surface which means we'd need to support it for a long time. I'm actually quite swamped this week as well but I am hoping I can spend time on this next week. Thanks for your patience!

Maoni0 commented 6 years ago

dotnet/coreclr#13625

Ettery commented 6 years ago

How does the GC/CoreCLR view the machine specs when running in a Docker container? Does every run-time think it has use of all the host resources? I ask because I'm running a number of simple APIs (microservices) on Docker on Ubuntu (ASP.NET Core 1.1.2), and each container is consuming 500-900 Mb of memory. It's not a powerful server - 16Gb/4 cores at the moment. CPU usage is under 5% but I'm hitting memory limits. I've not been able to find a solution yet. I can tell Docker to limit the memory available to each container, but what I see is that they don't seem to use less memory, they just start using swap space, which will hurt their performance.

janvorli commented 6 years ago

@Ettery .NET core reads process memory limits from the corresponding cgroups, but there was a bug that caused it to not to work properly in docker. This bug is fixed in the upcoming 2.0.2 release and as an addition, CPU count limits will be honored too.

Ettery commented 6 years ago

@janvorli - Interesting, thank you! Look forward to that. Meantime <ServerGarbageCollection>false</ServerGarbageCollection> seems to be working for me, I don't have high concurrency at the moment.

Dan-True commented 6 years ago

Is this still being investigated? I just made en entirely new ASP.NET project using 'dotnet new vue' and after first request to the backend, memory use was already 103.7 MB - which seems like a lot to listen on a single port and host a single endpoint.

Is this asp.net or the clr eating up that memory?

Is there anything I can do to help this investigation/fix/feature? I haven't meddled in the dotnet repo before, but would love to help if I can.

jkotas commented 6 years ago

Is this asp.net or the clr eating up that memory?

You would need to profile to get the answer. The memory use of basic ASP.NET app (dotnet new web) after first request is <15MB. So the remaining ~85MB are likely result of whatever the vue template is doing.

You can try turning off server garbage collector using <ServerGarbageCollection>false</ServerGarbageCollection> as suggested above and see whether it makes a difference.

dstj commented 5 years ago

@janvorli mentioned a fix in 2.0.2, but as of 2.1.402, I can confirm that setting <ServerGarbageCollection>false</ServerGarbageCollection> can still have a huge impact on memory consumption.

Here's a graph from my dotnet core app (dotnet core + Angular) running on a Heroku Dyno. It's limited to 512MB and I constantly hit the limit before setting ServerGarbageCollection to false. It has no volume whatsoever since it's a staging app. I was the only one hitting the site.

image

theberserker commented 5 years ago

For us this is a big issue under docker on linux host (Debian 9), using images dotnet:2.1-sdk (builder), dotnet:2.1-aspnetcore-runtime (runner). We had memory consumption of a fairly simple MVC app as high as 50GB in a few days of running the service! After setting ServerGarbageCollection=false it went to the area of 60MB. Additionally I was unable to reproduce the issue locally on Windows development machine without Docker. There was this issue in the past already https://github.com/aspnet/aspnet-docker/issues/300 but that was supposed to be fixed already. Is this known and tracked already somewhere under better issue? What additional info would be required for you to check this in more detail?

janvorli commented 5 years ago

cc: @Maoni0

Aniel commented 5 years ago

Could this be related to https://github.com/aspnet/AspNetCore/issues/3409?

effyteva commented 5 years ago

For us this is a big issue under docker on linux host (Debian 9), using images dotnet:2.1-sdk (builder), dotnet:2.1-aspnetcore-runtime (runner). We had memory consumption of a fairly simple MVC app as high as 50GB in a few days of running the service! After setting ServerGarbageCollection=false it went to the area of 60MB. Additionally I was unable to reproduce the issue locally on Windows development machine without Docker. There was this issue in the past already aspnet/aspnet-docker#300 but that was supposed to be fixed already. Is this known and tracked already somewhere under better issue? What additional info would be required for you to check this in more detail?

+1 for this, we suffer from the same issue on 2.2

Rakiah commented 5 years ago

We also have the same problem, any solutions to this except <ServerGarbageCollection>false</ServerGarbageCollection>

Maoni0 commented 5 years ago

could you please give us more detail on your container setup? what limits did you set on your container?

Rakiah commented 5 years ago

I've been using .NET Core 2.1.6 and running dotnet watch run *.csproj inside my container I've tried to set no limits and each of my containers reached 700mb, by adding a 150mb mem_limit the microservices never manage to fully build as it is so slow, each request is raising memory usage by 1mb and at some point (roughly 15min after) it went down to 250mb (which is still extremely high for an almost blank project API

Maoni0 commented 5 years ago

I'm not sure what to expect for "an almost blank project API". but it's odd that you could even go up to 250mb in a container with the mem_limit as only 150mb.

we'll need to collect some data to see what's going on. would it be possible to collect a trace with the instructions mentioned here? you can pass in the -gccollectonly commandline arg.

Rakiah commented 5 years ago

Actually it is correctly limited to 150 mb and never raises to 250 mb, I said that for the first case, when limiting to 150 mb it literally take ages fo the command dotnet watch run to complete (which seems very similar to a slowness linked to lack of memory)

I'll collect the data required today and will post there

Rakiah commented 5 years ago

Following procedure for collecting trace doesn't seem to work, it says that my .NET Core process is not running while running ps aux show all of the dotnet commands running

Starting post-processing. This may take some time.

zero-sized file (perf.data), nothing to do!
Generating native image symbol files
zero-sized file (perf.data), nothing to do!
libcoreclr.so not found in perf data. Please verify that your .NET Core process is running and consuming CPU.
Saving native symbols
zero-sized file (perf.data), nothing to do!
...FINISHED
Exporting perf.data file
...FINISHED
Compressing trace files
...FINISHED
Cleaning up artifacts
...FINISHED

Trace saved to sampleTrace.trace.zip
root@d15928a78650:/src/Authentication/Authentication# 
root@d15928a78650:/src/Authentication/Authentication# ps aux   
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   4276   796 ?        Ss   15:23   0:00 /bin/sh -c dotnet watch run --urls=http://+:80
root         6  0.1  0.1 3137464 57796 ?       SLl  15:23   0:00 dotnet watch run --urls=http://+:80
root        24  1.3  0.0 4837284 40952 ?       SLl  15:23   0:07 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/DotnetTools/dotnet-watch/2.2.0/tools/netcoreapp2.2/any/dotnet-watch.dllroot        84  0.4  0.1 3950672 93664 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       101  0.3  0.1 4155492 88064 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       120  0.3  0.1 4081744 85356 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       142  0.3  0.1 4016208 85084 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       162  0.3  0.1 4081744 83640 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       183  0.3  0.1 4221012 86564 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       202  0.4  0.1 4024404 92676 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       225  0.3  0.1 4024404 87204 ?       SLl  15:23   0:02 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       335  0.1  0.1 3334352 85608 ?       SLl  15:23   0:01 /usr/share/dotnet/dotnet run --urls=http://+:80
root       468  0.3  0.1 3740624 81952 ?       SLl  15:23   0:01 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /usr/share/dotnet/sdk/2.2.105/MSBuild.dll /nologo /nodemoderoot       505  1.8  0.6 20673588 347108 ?     SLl  15:23   0:10 /usr/share/dotnet/dotnet /usr/share/dotnet/sdk/2.2.105/Roslyn/bincore/VBCSCompiler.dll -pipename:root.F.Id9n6W13pgGW8gTi+C2KZtroot       613  972  0.1 21465172 93096 ?      SLl  15:23  90:47 dotnet exec /src/Authentication/Authentication/bin/container/Debug/netcoreapp2.1/Authentication.dll --urls=http://+:80
root      2248  1.2  0.0 902460  6260 ?        Ssl  15:28   0:03 lttng-sessiond --daemonize
root      2258  0.0  0.0  82964   560 ?        S    15:28   0:00 lttng-runas    --daemonize
root      2279  0.0  0.0 539296  5872 ?        Sl   15:28   0:00 lttng-consumerd  -u --consumerd-cmd-sock /var/run/lttng/ustconsumerd64/command --consumerd-err-sock /var/run/lttng/ustconsumerroot      2281  0.0  0.0  80268   540 ?        S    15:28   0:00 lttng-runas      -u --consumerd-cmd-sock /var/run/lttng/ustconsumerd64/command --consumerd-err-sock /var/run/lttng/ustconsumerroot      3272  0.1  0.0  18184  3364 pts/0    Ss   15:32   0:00 /bin/bash
root      3586  0.0  0.0  36632  2872 pts/0    R+   15:33   0:00 ps aux
Rakiah commented 5 years ago

It seems like if I limit the container to only have half of a core of a CPU the memory usage drops to a stable 300MB, (which is still extremly high)

Rakiah commented 5 years ago

I've tried to reduce it down to 0.1 cpu but now it takes half an hour to boot up, also the improvements in memory usage is not that big since it only drops to 250mb, doesn't seem to be a good solution

NinoFloris commented 5 years ago

@Rakiah Could you try it with a preview net core 3.0 sdk container? There have been some decent wins around memory usage in 3.0.

Also I'm curious why you need to build/watch a project in a container that you then also give a memory limit? The scenario these low memory limits were optimized for are mainly production usage, a plain runtime container (no dotnet sdk) that just boots your published directory entrypoint dll. Do you have a specific reason why you cannot go with the normal publish scenario?

Rakiah commented 5 years ago

I just tried the memory limit to mitigate the 700mb ram usage:

Let me explain my cases: we're running a micro-services architecture, my developpers run on machines with 16 gb of ram, If they boot up our currently 10 (but growing) microservices it takes just 8 GB of ram for this to work, but they also run Kafka & a Unity frontend, this is already a problem and will be a bigger problem soon enough, i'll try with .NET Core 3.0 SDK container tomorrow as I have to leave now:

Also, I've just ran this using .NET Core 2.1 SDK For Linux dotnet new web and added this dockerfile:

FROM microsoft/dotnet:sdk

WORKDIR /vsdbg
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
            unzip \
    && rm -rf /var/lib/apt/lists/* \
    && curl -sSL https://aka.ms/getvsdbgsh \
        | bash /dev/stdin -v latest -l /vsdbg

ENV DOTNET_USE_POLLING_FILE_WATCHER 1

WORKDIR /src

ENTRYPOINT dotnet watch run --urls=http://+:80

building it like so: docker build -t test . and finally running it like so: docker run -v "$(pwd):/src" test I get a service using 180mb of RAM

Maoni0 commented 5 years ago

@tommcdon would it be possible to have someone from your team to look at the problem of collecting traces on linux that @Rakiah is encountering?

Rakiah commented 5 years ago

Doesn't seem very trivial to go from 2.1.6 to 3.0.0-preview4 @NinoFloris I get error at bootup for my code (however, for the small time that the container is running (before any of my code is running basically, it raises to 280 mb of RAM))

Maoni0 commented 5 years ago

also looping in @noahfalk for the perfcollect issue above.

noahfalk commented 5 years ago

Following procedure for collecting trace doesn't seem to work...

@Rakiah Very sorry to see you have been having trouble with this. Could you share that sampleTrace.trace.zip file that was generated with us? There is a logfile inside which might have useful diagnostic information in it about why the collection had problems.

Despite the ominous sounding zero-size warnings, the zip may also have trace data in it that @Maoni0 could use. Perfcollect script is running two collection tools in parallel, perf and lttng. Perf collects sampled callstacks that are useful for CPU investigations whereas lttng collects runtime instrumentation, such as events produced by the GC. Even if the perf portion has been lost for whatever reason, the lttng portion might still have very useful GC information in it.

mangod9 commented 3 years ago

Hi @kvochko, @Rakiah this issue seems to have stalled trying to collect logs on linux? Is there anything actionable as part of this? Also would be ideal to check the behavior on .net 5. Thanks

code99 commented 2 years ago

Hi @noahfalk and @msftgits - has this been addressed in .net 6?

noahfalk commented 2 years ago

I was only helping with the perfcollect tooling issues. You probably want to ask the GC folks like @mangod9 or @Maoni0.

code99 commented 2 years ago

Thank you, @noahfalk.

Hi @mangod9 and @Maoni0 - Can you please confirm the status of this issue in .net 6? Thanks.

majid-sharif-soleimani commented 2 years ago

I have written a project to fetch some data from database, process them and send a message to a server. It runs fine on my laptop (which runs Windows 10). On my computer, the project consumes less than 200 megabytes of memory. However, when I deploy it on a t3.small ec2 machine using docker and docker-compose, it consumes more than 1.5 GB of RAM and makes the machine unresponsive. Can anyone help with this?