Open devoncarew opened 4 years ago
I'm going to work on logging more data, but from the time we get a pause event, to when package:dwds has populated frame info:
1996ms (first pause)
19ms (step)
17ms (step)
15ms (step)
This is for the flutter gallery app running through flutter run -d chrome. The stepping is significantly slower than the times above would indicate (5-10 seconds to step), so the time to populate frame info may not be a gating factor.
For package:sse, it looks like https://github.com/dart-lang/sse/pull/11 serializes events from the SSE server - we should see if this rate limits debugger traffic.
Is there an easy way to capture the traffic between dwds and Chrome when running Flutter web apps? Today I'm seeing 50-60 second waits when hitting breakpoints using chrome locally. I'm sure it was never this bad previously, but I've tried rolling back both Flutter and Dart-Code a fair bit and still seeing it but not sure where it's going. App is very simple single-file.
Edit: Seems to have gotten much worse somewhere between Flutter v1.14.6 and v1.16.3:
Flutter v1.12.13+hotfix.9: 23 seconds
[2:10:59 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ +709 ms] Creating text!
[2:11:22 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"
Flutter v1.13.9: 24 seconds
[2:16:39 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ +713 ms] Creating text!
[2:17:03 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"
Flutter v1.14.6: 24 seconds
[2:19:36 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== Creating text!
[2:20:00 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"
Flutter v1.15.22: (broken)
Flutter v1.16.3: 50 seconds
[2:27:57 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ +525 ms] Creating text!
[2:28:47 PM] [Observatory] [Info] [Flutter (Chrome)] <== {"jsonrpc":"2.0","method":"streamNotify","params":{"streamId":"Debug","event":{"type":"Event","kind":"PauseBreakpoint"
I don't know if it's related to this issue, but I'll try to instrument it and see where the time is going. These timings seem significantly higher than the other numbers above.
Seems like my issue is the same as this one, the time is spent fetching frames+variables:
[3:06:08 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ ] Pause handler: Calling dartFramesFor
[3:06:08 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [ ] calling _frameFor and _variablesFor for 421 frames
[3:07:05 PM] [FlutterRun] [Info] [Flutter (Chrome)] <== [+57171 ms] Pause handler: dartFramesFor done!
I tried to make the requests all parallel to see if it helped, but it never completed when I did that, it wasn't clear why.
FWIW, I'm not sure there's any SSE involved here, I think the connection to Chrome is a WebSocket?
Btw, in the debug-adapter-protocol we can tell the debugger we've paused without supplying the call stack/variables, which are requested separately/asynchronously. This means you can step quickly even without the frames/variables being available. If you step before they're complete, they're just discarded (possibly cancelled if the DA supports it).
It would likely be a change to the VM Service Protocol, but if it's likely this kinda of data will always be significantly slower than the pause, maybe it's worth considering - it could make stepping feel significantly snappier.
Is there an easy way to capture the traffic between dwds and Chrome when running Flutter web apps? Today I'm seeing 50-60 second waits when hitting breakpoints using chrome locally. I'm sure it was never this bad previously, but I've tried rolling back both Flutter and Dart-Code a fair bit and still seeing it but not sure where it's going. App is very simple single-file.
I've made local changes to the package in the past to write traffic to a well-named file on disk. We should likely generalize something so that developers can turn on logging when necessary. We'd need something that could be enabled from package:dwds, and that flutter_tools could toggle. (Or, just a const in package:dwds to log traffic if the dev had access to the source)
Btw, in the debug-adapter-protocol we can tell the debugger we've paused without supplying the call stack/variables, which are requested separately/asynchronously. This means you can step quickly even without the frames/variables being available. If you step before they're complete, they're just discarded (possibly cancelled if the DA supports it).
I think this maps ~well to the VM service protocol? When you get a pause event from it, it only includes the top frame. If you want more frames, you request that as a separate call (getStack()
).
I think this maps ~well to the VM service protocol? When you get a pause event from it, it only includes the top frame.
Oh yeah! In that case, I think we're holding up the pause event unnecessarily here while fetching all frames and variables before sending the pause event?
Today I'm seeing 50-60 second waits when hitting breakpoints using chrome locally.
Is this with just Chrome debugging or using Dart DevTools to debug? I want to understand if we are hitting the Chrome slow stepping issue.
FWIW, I'm not sure there's any SSE involved here, I think the connection to Chrome is a WebSocket?
It depends. If you are using the Dart Debug Extension then it uses SSE to communicate with Chrome. Otherwise, Chrome is spun up with the remote debug port open and a websocket is used for local communication.
I think this maps ~well to the VM service protocol? When you get a pause event from it, it only includes the top frame. If you want more frames, you request that as a separate call (
getStack()
).
That looks to be the case. We compute all frames when according to the documentation only the top frame is required. We could update the logic here and only compute the top frame. We can save the expensive work for getStack
.
Is this with just Chrome debugging or using Dart DevTools to debug?
This is using VS Code launching a Flutter web app using Chrome (so dwds is interacting with the Chrome debugger over the debugger protocol). There's no Dart DevTools, and the Chrome DevTools are not open/visible.
The whole time is spent inside the pause handler, calling frameFor
and variablesFor
:
[3:06:08 PM] [Flutter (Chrome)] <== [ ] Pause handler: Calling dartFramesFor
[3:06:08 PM] [Flutter (Chrome)] <== [ ] calling _frameFor and _variablesFor for 421 frames
[3:07:05 PM] [Flutter (Chrome)] <== [+57171 ms] Pause handler: dartFramesFor done!
Otherwise, Chrome is spun up with the remote debug port open and a websocket is used for local communication.
Yep, that's the situation here.
We could update the logic here and only compute the top frame. We can save the expensive work for getStack.
I believe this would make a huge difference. While iterating on my tests locally, I rigged it like this and it was a lot faster:
It's not a complete fix, as it doesn't handle this in getStack - it's good enough to pass a lot of my integration tests though 🙂
I can take a look at this later this week. The proper fix isn't too complex either. We'll just need to keep state around so we can properly supply getStack
.
I took a look at this today and changed the logic so that only the top frame is calculated when returning the paused event here.
However, it appears that whenever we return a pause event there is a corresponding getStack
call. Thus we are just shuffling the work around and there really isn't a performance improvement.
I believe this would make a huge difference. While iterating on my tests locally, I rigged it like this and it was a lot faster:
I don't think this is a fair comparison because you are reducing the overall work. The getStack
response is cached and the full work is never done.
Is there a scenario where we send a paused event but there isn't a corresponding getStack
call? If not I'll probably hold off on this PR.
It's good to know that we're not calculating the stack twice - that would have been the performance win.
I think it's still worth returning the top stack first - that'll let clients display the paused location and issue an async call for populating the rest of the stack. Given how long the stack currently takes to populate, that will likely have a good responsiveness win.
However, it appears that whenever we return a pause event there is a corresponding
getStack
call. Thus we are just shuffling the work around and there really isn't a performance improvement.
Ugh... My assumption was that VS Code would pause the debugger immediately, and the getStack call was then async to populate the call stack. This would result in much faster stepping. However, I'm now remembering that VS code will jump to the first frame that's user-code, so maybe it does await the stack results ☹️
I'll do some testing and see if I can anything.
Do I understand correctly that the following will help the situation as a longer term solution:
- separate callstack from variables: create a new VM API getStackOnly to return the stack with references to frame objects.
This could certainly help. We could have the paused event return info about the number of frames on the stack, and have the getStack()
call have some optional params for either not populating variable info, or, allowing a client to page through the actual frames (only return info for frame index xx and the next 10 frames).
Likely useful to try and improve performance with the existing API, in order to mitigate changes to existing clients, but definitely a useful option to consider.
- add bulk getObjects API (for example, get all variable values for a frame, or all values for fields of an object)
Clients may be able to parallelize their requests here? We only ever need to show variables for one frame at a time, so may not pay a huge penalty for getting info about frame variables. I think the larger cost is for package:dwds to calculate all the variable info for all the frames ahead of time?
- cache what's possible (we could have a lossy cache to avoid storing the world)
I think this would be great to investigate, in addition to seeing if there are ways to have fewer chrome debug protocol calls to collect the same info. I'll share some sample wire protocol traffic in hopes that that'll spark some ideas.
Attaching a log of vm service traffic to and from package:dwds, and chrome debugger traffic sent and received from dwds:
Ok, I did some more testing:
However, it appears that whenever we return a pause event there is a corresponding getStack call. Thus we are just shuffling the work around and there really isn't a performance improvement.
VS Code does wait for getStack to complete before breaking. However, it requests only frames initially, then batches of 20 if the user clicks "load more frames" (so not required to show we've paused).
VS Code does not wait for variables (although in the VM we get these in the stack, the debugger-adapter-protocol requests them separately for a given frame).
So, for @annagrin's suggestions:
separate callstack from variables: create a new VM API getStackOnly to return the stack with references to frame objects.
If we're adding a new function here, supporting startFrame + totalFrames (and have the proxy only prepare those requested) would significantly help too (my example above fetched 421 frames, but VS Code only ever showed 20 unless I expand more).
add bulk getObjects API (for example, get all variable values for a frame)
A single request to get all variables for a frame would also be great. VS Code only requests variables for the selected frame.
I think with those two changes, for VS Code we'd go from building 421 frames and 421 sets of variables, to building 20 frames and 1 set of variables, with no change at all in what's visible to the user (and the UI would show we've paused even without any variables being provided).
Edit: The difference is significant if only waiting for 20 frames and no variables - 57s to 236ms for the frame building:
[9:10:52 AM] [Flutter (Chrome)] <== [+4801 ms] Pause handler: Calling dartFramesFor
[9:11:50 AM] [Flutter (Chrome)] <== [+57159 ms] Pause handler: dartFramesFor done!
[9:13:49 AM] [Flutter (Chrome)] <== [+12424 ms] Pause handler: Calling dartFramesFor (only returning 20)
[9:13:49 AM] [Flutter (Chrome)] <== [ +236 ms] Pause handler: dartFramesFor done!
create a new VM API getStackOnly to return the stack with references to frame objects
To build the stack in VS Code we do need a few fields from Frame (.code.name
and .location
), would they be included in these references?
To update here: I investigated caching the calculated ClassDef
s (each requires a round trip to Chrome to calculate). This does not appear to be a win, as:
RemoteObject.calssName
field, which I'd been intending to use to know when we can cache a ClassDefRemoteObject.objectId
field, but that will only let us cache the classdef per object, not per object type, so will yield minimal benefit, at the expense of having to maintain caching codeI'm going to remove a todo: to investigate, but don't consider this a worthwhile improvement.
I wonder if storing types and symbols in the metadata by the compiler will help here. I plan to do this later to improve evaluation, need to think this through.
This is a meta issue to track improving the performance of populating the Dart frames. This happens when the JS VM pauses, and is likely the main place we're spending time on when the user is stepping through code (see also https://github.com/dart-lang/webdev/issues/975). Some general thoughts:
measurement
We should add logging to be better aware of the number of calls we are making to the chrome backend - to see if there are places where we can reduce the number of calls, increase the parallelism, or avoid some calls.
Somewhat related to https://github.com/dart-lang/webdev/issues/977.
Optimize calls to chrome
Currently when building the dart frames, we make many async calls per frame and per variable, blocking on each call to complete. We do all the work in series. We should explore instead parallelizing the work - gathering data for all n frames in parallel, and sending the data back to the IDE when complete.
If it's possible to avoid some calls altogether, we should do that.
compiler metadata
As part of the IPL work, the compiler will start generating more metadata that package:dwds can use. We should make use of that when available to eliminate some calls to the backend.
package:sse
We should investigate if and how much slower SSE is than websockets. Does it create a connection per request? Does this have a high cost?
In addition, package:sse may serialize each call, and not issue a subsequent call before the first call completes. This is to ensure results are not received out of order. If true, we may want to relax this - if we have 100 calls to a backend, we want to do that in parallel as much as possible, rather than one at a time.
cc @grouma @annagrin