Roslyn syntax tree caching strategy causes memory pressure on large solutions

panopticoncentral commented 4 years ago

Roslyn keeps the following syntax trees in memory:

1) Files that are smaller than 4k 2) Files that contains global attributes 3) Files whose syntax tree has not been retrieved by someone

This works well for responsiveness for smaller solutions, but when you scale solutions up to 100's of projects, this can cause a big chunk of the managed heap (~20-30%) to be consumed by syntax trees. It looks like new heuristics would definitely improve the scalability. I was toying with the following general idea:

All syntax trees are created as recoverable trees
However, any trees that would not have been recoverable trees in the past (i.e. files < 4k or files with global attributes) will not write themselves out to disk after a read
When a recoverable tree is created, it adds itself to global (maybe?) table
When a recoverable tree is written to disk, it removes itself from the table
If, while adding a tree to the table, we see that the table size has gone over a threshold (say, 1024 trees), we pick some number (say, 256 trees) and tell them to write themselves out to disk.
Additionally, if we notice we are in a low memory situation, we simply go through and tell every tree in the table to write itself out to disk.

In other words, we keep an eye on how many syntax trees are sitting in memory and when they start growing too large, we start actively pushing them out to disk. Trees that wouldn’t have been written out in the past will stay in memory until things get large, so we shouldn’t regress any existing perf scenarios.

panopticoncentral commented 4 years ago

It also seems like I'm seeing more recoverable trees show up in the dumps when we run out of memory, so I'm not sure if something changed around recoverable trees recently.

CyrusNajmabadi commented 4 years ago

Closing out as resolved good enough by OOP. We can re-examine this if we still see high overhead in-proc. Note: our future plans for OOP are that we don';t try to have these caches, and go only to VM in 64bit processes to handle these cases.

dotnet / roslyn

Roslyn syntax tree caching strategy causes memory pressure on large solutions #40300