dart-lang / webdev

A CLI for Dart web development.
https://pub.dev/packages/webdev
212 stars 75 forks source link

Improve the performance of populating the Dart frames #993

Open devoncarew opened 4 years ago

devoncarew commented 4 years ago

This is a meta issue to track improving the performance of populating the Dart frames. This happens when the JS VM pauses, and is likely the main place we're spending time on when the user is stepping through code (see also https://github.com/dart-lang/webdev/issues/975). Some general thoughts:

measurement

We should add logging to be better aware of the number of calls we are making to the chrome backend - to see if there are places where we can reduce the number of calls, increase the parallelism, or avoid some calls.

Somewhat related to https://github.com/dart-lang/webdev/issues/977.

Optimize calls to chrome

Currently when building the dart frames, we make many async calls per frame and per variable, blocking on each call to complete. We do all the work in series. We should explore instead parallelizing the work - gathering data for all n frames in parallel, and sending the data back to the IDE when complete.

If it's possible to avoid some calls altogether, we should do that.

compiler metadata

As part of the IPL work, the compiler will start generating more metadata that package:dwds can use. We should make use of that when available to eliminate some calls to the backend.

package:sse

We should investigate if and how much slower SSE is than websockets. Does it create a connection per request? Does this have a high cost?

In addition, package:sse may serialize each call, and not issue a subsequent call before the first call completes. This is to ensure results are not received out of order. If true, we may want to relax this - if we have 100 calls to a backend, we want to do that in parallel as much as possible, rather than one at a time.

cc @grouma @annagrin

devoncarew commented 4 years ago

I'm going to work on logging more data, but from the time we get a pause event, to when package:dwds has populated frame info:

1996ms (first pause)
19ms (step)
17ms (step)
15ms (step)

This is for the flutter gallery app running through flutter run -d chrome. The stepping is significantly slower than the times above would indicate (5-10 seconds to step), so the time to populate frame info may not be a gating factor.

devoncarew commented 4 years ago

For package:sse, it looks like https://github.com/dart-lang/sse/pull/11 serializes events from the SSE server - we should see if this rate limits debugger traffic.

DanTup commented 4 years ago

Is there an easy way to capture the traffic between dwds and Chrome when running Flutter web apps? Today I'm seeing 50-60 second waits when hitting breakpoints using chrome locally. I'm sure it was never this bad previously, but I've tried rolling back both Flutter and Dart-Code a fair bit and still seeing it but not sure where it's going. App is very simple single-file.

Edit: Seems to have gotten much worse somewhere between Flutter v1.14.6 and v1.16.3:

Flutter v1.12.13+hotfix.9: 23 seconds
[2:10:59 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ +709 ms] Creating text!
[2:11:22 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"

Flutter v1.13.9: 24 seconds
[2:16:39 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ +713 ms] Creating text!
[2:17:03 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"

Flutter v1.14.6: 24 seconds
[2:19:36 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== Creating text!
[2:20:00 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"

Flutter v1.15.22: (broken)

Flutter v1.16.3: 50 seconds
[2:27:57 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ +525 ms] Creating text!
[2:28:47 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"

I don't know if it's related to this issue, but I'll try to instrument it and see where the time is going. These timings seem significantly higher than the other numbers above.

DanTup commented 4 years ago

Seems like my issue is the same as this one, the time is spent fetching frames+variables:

[3:06:08 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [        ] Pause handler: Calling dartFramesFor
[3:06:08 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [        ] calling _frameFor and _variablesFor for 421 frames
[3:07:05 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [+57171 ms] Pause handler: dartFramesFor done!

I tried to make the requests all parallel to see if it helped, but it never completed when I did that, it wasn't clear why.

FWIW, I'm not sure there's any SSE involved here, I think the connection to Chrome is a WebSocket?

DanTup commented 4 years ago

Btw, in the debug-adapter-protocol we can tell the debugger we've paused without supplying the call stack/variables, which are requested separately/asynchronously. This means you can step quickly even without the frames/variables being available. If you step before they're complete, they're just discarded (possibly cancelled if the DA supports it).

It would likely be a change to the VM Service Protocol, but if it's likely this kinda of data will always be significantly slower than the pause, maybe it's worth considering - it could make stepping feel significantly snappier.

devoncarew commented 4 years ago

Is there an easy way to capture the traffic between dwds and Chrome when running Flutter web apps? Today I'm seeing 50-60 second waits when hitting breakpoints using chrome locally. I'm sure it was never this bad previously, but I've tried rolling back both Flutter and Dart-Code a fair bit and still seeing it but not sure where it's going. App is very simple single-file.

I've made local changes to the package in the past to write traffic to a well-named file on disk. We should likely generalize something so that developers can turn on logging when necessary. We'd need something that could be enabled from package:dwds, and that flutter_tools could toggle. (Or, just a const in package:dwds to log traffic if the dev had access to the source)

devoncarew commented 4 years ago

Btw, in the debug-adapter-protocol we can tell the debugger we've paused without supplying the call stack/variables, which are requested separately/asynchronously. This means you can step quickly even without the frames/variables being available. If you step before they're complete, they're just discarded (possibly cancelled if the DA supports it).

I think this maps ~well to the VM service protocol? When you get a pause event from it, it only includes the top frame. If you want more frames, you request that as a separate call (getStack()).

DanTup commented 4 years ago

I think this maps ~well to the VM service protocol? When you get a pause event from it, it only includes the top frame.

Oh yeah! In that case, I think we're holding up the pause event unnecessarily here while fetching all frames and variables before sending the pause event?

grouma commented 4 years ago

Today I'm seeing 50-60 second waits when hitting breakpoints using chrome locally.

Is this with just Chrome debugging or using Dart DevTools to debug? I want to understand if we are hitting the Chrome slow stepping issue.

FWIW, I'm not sure there's any SSE involved here, I think the connection to Chrome is a WebSocket?

It depends. If you are using the Dart Debug Extension then it uses SSE to communicate with Chrome. Otherwise, Chrome is spun up with the remote debug port open and a websocket is used for local communication.

I think this maps ~well to the VM service protocol? When you get a pause event from it, it only includes the top frame. If you want more frames, you request that as a separate call (getStack()).

That looks to be the case. We compute all frames when according to the documentation only the top frame is required. We could update the logic here and only compute the top frame. We can save the expensive work for getStack.

DanTup commented 4 years ago

Is this with just Chrome debugging or using Dart DevTools to debug?

This is using VS Code launching a Flutter web app using Chrome (so dwds is interacting with the Chrome debugger over the debugger protocol). There's no Dart DevTools, and the Chrome DevTools are not open/visible.

The whole time is spent inside the pause handler, calling frameFor and variablesFor:

[3:06:08 PM] [Flutter (Chrome)] <== [        ] Pause handler: Calling dartFramesFor
[3:06:08 PM] [Flutter (Chrome)] <== [        ] calling _frameFor and _variablesFor for 421 frames
[3:07:05 PM] [Flutter (Chrome)] <== [+57171 ms] Pause handler: dartFramesFor done!

Otherwise, Chrome is spun up with the remote debug port open and a websocket is used for local communication.

Yep, that's the situation here.

We could update the logic here and only compute the top frame. We can save the expensive work for getStack.

I believe this would make a huge difference. While iterating on my tests locally, I rigged it like this and it was a lot faster:

Screenshot 2020-05-12 at 17 31 51

It's not a complete fix, as it doesn't handle this in getStack - it's good enough to pass a lot of my integration tests though 🙂

grouma commented 4 years ago

I can take a look at this later this week. The proper fix isn't too complex either. We'll just need to keep state around so we can properly supply getStack.

grouma commented 4 years ago

I took a look at this today and changed the logic so that only the top frame is calculated when returning the paused event here.

However, it appears that whenever we return a pause event there is a corresponding getStack call. Thus we are just shuffling the work around and there really isn't a performance improvement.

I believe this would make a huge difference. While iterating on my tests locally, I rigged it like this and it was a lot faster:

I don't think this is a fair comparison because you are reducing the overall work. The getStack response is cached and the full work is never done.

Is there a scenario where we send a paused event but there isn't a corresponding getStack call? If not I'll probably hold off on this PR.

devoncarew commented 4 years ago

It's good to know that we're not calculating the stack twice - that would have been the performance win.

I think it's still worth returning the top stack first - that'll let clients display the paused location and issue an async call for populating the rest of the stack. Given how long the stack currently takes to populate, that will likely have a good responsiveness win.

DanTup commented 4 years ago

However, it appears that whenever we return a pause event there is a corresponding getStack call. Thus we are just shuffling the work around and there really isn't a performance improvement.

Ugh... My assumption was that VS Code would pause the debugger immediately, and the getStack call was then async to populate the call stack. This would result in much faster stepping. However, I'm now remembering that VS code will jump to the first frame that's user-code, so maybe it does await the stack results ☹️

I'll do some testing and see if I can anything.

annagrin commented 4 years ago

Do I understand correctly that the following will help the situation as a longer term solution:

devoncarew commented 4 years ago
  • separate callstack from variables: create a new VM API getStackOnly to return the stack with references to frame objects.

This could certainly help. We could have the paused event return info about the number of frames on the stack, and have the getStack() call have some optional params for either not populating variable info, or, allowing a client to page through the actual frames (only return info for frame index xx and the next 10 frames).

Likely useful to try and improve performance with the existing API, in order to mitigate changes to existing clients, but definitely a useful option to consider.

  • add bulk getObjects API (for example, get all variable values for a frame, or all values for fields of an object)

Clients may be able to parallelize their requests here? We only ever need to show variables for one frame at a time, so may not pay a huge penalty for getting info about frame variables. I think the larger cost is for package:dwds to calculate all the variable info for all the frames ahead of time?

  • cache what's possible (we could have a lossy cache to avoid storing the world)

I think this would be great to investigate, in addition to seeing if there are ways to have fewer chrome debug protocol calls to collect the same info. I'll share some sample wire protocol traffic in hopes that that'll spark some ideas.

devoncarew commented 4 years ago

Attaching a log of vm service traffic to and from package:dwds, and chrome debugger traffic sent and received from dwds:

dwds_log_2.txt

DanTup commented 4 years ago

Ok, I did some more testing:

However, it appears that whenever we return a pause event there is a corresponding getStack call. Thus we are just shuffling the work around and there really isn't a performance improvement.

VS Code does wait for getStack to complete before breaking. However, it requests only frames initially, then batches of 20 if the user clicks "load more frames" (so not required to show we've paused).

VS Code does not wait for variables (although in the VM we get these in the stack, the debugger-adapter-protocol requests them separately for a given frame).

So, for @annagrin's suggestions:

separate callstack from variables: create a new VM API getStackOnly to return the stack with references to frame objects.

If we're adding a new function here, supporting startFrame + totalFrames (and have the proxy only prepare those requested) would significantly help too (my example above fetched 421 frames, but VS Code only ever showed 20 unless I expand more).

add bulk getObjects API (for example, get all variable values for a frame)

A single request to get all variables for a frame would also be great. VS Code only requests variables for the selected frame.

I think with those two changes, for VS Code we'd go from building 421 frames and 421 sets of variables, to building 20 frames and 1 set of variables, with no change at all in what's visible to the user (and the UI would show we've paused even without any variables being provided).

Edit: The difference is significant if only waiting for 20 frames and no variables - 57s to 236ms for the frame building:

[9:10:52 AM] [Flutter (Chrome)] <== [+4801 ms] Pause handler: Calling dartFramesFor
[9:11:50 AM] [Flutter (Chrome)] <== [+57159 ms] Pause handler: dartFramesFor done!

[9:13:49 AM] [Flutter (Chrome)] <== [+12424 ms] Pause handler: Calling dartFramesFor (only returning 20)
[9:13:49 AM] [Flutter (Chrome)] <== [ +236 ms] Pause handler: dartFramesFor done!
DanTup commented 4 years ago

create a new VM API getStackOnly to return the stack with references to frame objects

To build the stack in VS Code we do need a few fields from Frame (.code.name and .location), would they be included in these references?

devoncarew commented 4 years ago

To update here: I investigated caching the calculated ClassDefs (each requires a round trip to Chrome to calculate). This does not appear to be a win, as:

I'm going to remove a todo: to investigate, but don't consider this a worthwhile improvement.

annagrin commented 4 years ago

I wonder if storing types and symbols in the metadata by the compiler will help here. I plan to do this later to improve evaluation, need to think this through.