dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.2k stars 1.57k forks source link

Analyzer plugins should be automatically terminated if they use more than 5GB of memory or more than ??X seconds of CPU #48654

Open jacob314 opened 2 years ago

jacob314 commented 2 years ago

The popular dart_code_metrics analyzer plugin sometimes uses more than 10GB of memory badly degrading the entire Dart IDE user experience. When an analyzer plugin behaves badly, users have no idea that a plugin caused the issue and all they can tell is that the Dart analyzer is not performant.

To mitigate this, we should have the analyzer automatically kill plugins that use too much memory or too much CPU and surface a warning to the user explaining what happened. Plugins already run in their own isolate so this could be achieved today using the Dart VM service or achieved by adding an additional Dart VM API to directly limit memory-usage on a per isolate basis (similar to existing --old_gen_heap_size flag).

Ideally the analysis server would also emit an message that IDEs can display to explain to the user what went wrong. "Analyzer plugin dart_code_metrics was terminated as it used more than XGB of memory". You can increase its maximum memory usage by editing your analysis_options.yaml file or notify the plugin author"

Related issue: https://github.com/flutter/flutter/issues/95092#issuecomment-1075469757

incendial commented 2 years ago

@jacob314 hi, any chance that instead of killing the plugin, we can somehow fix the memory leak? Coz I was trying to get some help and have literally zero response. And in the discussion https://github.com/flutter/flutter/issues/95092 it's pretty clear that it's not a problem on a plugin side (especially, since there is no public api to handle the cache).

srawlins commented 2 years ago

@jacob314 is there a Dart API which allows seeing the memory and CPU of an isolate?

srawlins commented 2 years ago

@keertip you mentioned using an API internally to cap the memory of all isolates (or all isolate groups?). I think it was a VM flag, --old-gen-heap-size. What is the behavior when an isolate tries to use more? Does the isolate throw, or does the whole Dart process crash? I cannot find documentation for this flag.

srawlins commented 2 years ago

I found in the VM's source, "Max size of old gen heap size in MB, or 0 for unlimited, e.g: --old_gen_heap_size=1024 allows up to 1024MB old gen heap" and another flag, --abort-on-oom, "Abort if memory allocation fails - use only with --old-gen-heap-size". And this source:

void IsolateGroup::CreateHeap(bool is_vm_isolate,
                              bool is_service_or_kernel_isolate) {
  Heap::Init(this, is_vm_isolate,
             is_vm_isolate
                 ? 0  // New gen size 0; VM isolate should only allocate in old.
                 : FLAG_new_gen_semi_max_size * MBInWords,
             (is_service_or_kernel_isolate ? kDefaultMaxOldGenHeapSize
                                           : FLAG_old_gen_heap_size) *
                 MBInWords);

  is_vm_isolate_heap_ = is_vm_isolate;
}
srawlins commented 2 years ago

The analysis_server itself should be subject to this ceiling as well, right? I don't know why, in the case of the analysis_server using an unexpected amount of memory, we also wouldn't want to crash eagerly and restart.

bwilkerson commented 2 years ago

The analysis_server itself should be subject to this ceiling as well, right?

If a single plugin is shutdown then the UX is diminished, but still functional. If the main isolate is shutdown then the user has no support. Assuming that the main isolate uses memory proportional to the number of lines of code, then putting a ceiling on its memory usage would be equivalent to saying that we only support code bases up to a certain number of lines of code. I'm not sure we want to do that.

srawlins commented 2 years ago

If the user's entire Dart IDE experience can be badly degraded by a process using too much memory, then don we not today only support code bases up to a certain number of lines of code?

I don't know why a plugin should not be allowed to badly degrade the IDE experience, but the analysis server should. The symptoms and solutions are the same, aren't they?

  1. Today, if a plugin uses too much memory, the user's IDE experience degrades, and they must investigate or learn to restart analysis server. We think that is not a good UX, and capping a plugin's memory use is a better solution.
  2. Today if the analysis server uses too much memory, the user's IDE experience degrades, and they must investigate or learn to restart analysis server. Is this UX OK?
bwilkerson commented 2 years ago

If the user's entire Dart IDE experience can be badly degraded by a process using too much memory, then don we not today only support code bases up to a certain number of lines of code?

I think there's an important difference between a gracefully degrading experience and no experience. The question I was answering was whether the main isolate should be subject to being killed just like the proposal to kill plugin isolates that use too much memory. If one plugin is killed, the the main isolate, and any other plugins, can continue to run. If the main isolate is killed then all of the plugins are also killed.

If we want the server to be treated on equal footing, then we need to make most of it be a plugin on equal footing (run in a separate isolate so that killing it doesn't kill other plugins). But as it's currently architected, the server isn't a plugin, so it isn't equal.

For what it's worth, I'm not entirely sure that I like the proposal to kill plugins that are using too much memory. But I will point out that the server currently has support for automatically restarting plugins that crash, and we could potentially extend that to plugins that are killed for using too much memory.

... if a plugin uses too much memory ... We think that is not a good UX ... ... if the analysis server uses too much memory ... Is this UX OK?

I'm not claiming that it's OK for the analysis server to use too much memory and degrade the UX. I'm only claiming that killing the main isolate is more catastrophic than killing a plugin's isolate, and as a result that I'm not in favor of killing the main isolate.

I'll also point out that we have more control over the behavior of the analysis server than we do over the behavior of plugins. If the analysis server is using too much memory, then we need to fix it. If a plugin is using too much memory, then we have less control.

The symptoms and solutions are the same, aren't they?

The symptoms are the same, but as it's currently architected I don't believe that the solutions are the same.

I don't know why a plugin should not be allowed to badly degrade the IDE experience, but the analysis server should.

I'm not claiming that the analysis server should be allowed to degrade the IDE experience. I actually believe that it should be held to a higher standard than plugins.

But, the analysis server is responsible for supporting the basic semantics of the language. Any diagnostics related to issues that prevent the code from being run have to come from the analysis server. The basic services that depend only on the core language semantics (such as navigation and code completion) should all come from the analysis server. While I believe that the functionality provided by plugins can provide significant value to users, I believe that the impact on users of loosing the fundamental functionality provided by the server is bigger than the impact of loosing the functionality provided by plugins.

Based on that, it seems reasonable to say that we should be even more reluctant to kill the analysis server when its performance is degraded than we are to kill a plugin for the same reason.

jacob314 commented 2 years ago

Interesting discussion. Given that there is evidence that the core Dart analysis server can leak a significant amount of memory, I now think we should kill the analysis server as well when it exceeds a memory usage threshold. It would still be ideal to kill plugins when they exceed their memory usage threshold so that we only have to kill the full analysis server when the issue is not specific to plugins. https://github.com/dart-lang/sdk/issues/48876