dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.2k stars 1.57k forks source link

Observatory UI hang/debug crash #20049

Closed leafpetersen closed 9 years ago

leafpetersen commented 10 years ago

N.B. The directions below are for using a current bleeding edge enlistment, but I can reproduce the hang in Release mode with the current editor/dartium dev release (1.6.0.3 dev ia32 linux). I haven't tried the debug version of dev release (c.f. dartbug.com/20038).

With a current bleeding edge dartium enlistment (dartium r177891, dart r38218) on linux:

Built as follows:

./dart/tools/dartium/build.py --mode=Debug ./dart/tools/dartium/build.py --mode=Release

1) run a release build as follows (replacing the path to webbindingstest.html with the appropriate path to the attached dart program):

./out/Release/chrome --user-data-dir=/tmp/chrome0/ file:///usr/local/google/home/leafp/projects/shared/dart-code/WebBindingsTest/web/webbindingstest.html &

2) Open dev tools (Ctrl-Shift-J), and click on the observatory tab. I see the following error message in the terminal:

[30955:30955:0714/160729:ERROR:CONSOLE(291)] "Refused to execute inline script because it violates the following Content Security Policy directive: "script-src chrome://resources 'self' https://www.google.com 'unsafe-eval'". Either the 'unsafe-inline' keyword, a hash ('sha256-...'), or a nonce ('nonce-...') is required to enable inline execution. ", source: chrome-devtools://devtools/bundled/Observatory/index_devtools.html (291)

3) When the first file stops loading, reload it (using the chrome reload page button)

At this point, the Observatory UI becomes permanently mostly unresponsive. Hitting Refresh (in the observatory pane) goes off forever, and clicking most of the links in the observatory pane results in no navigation.

Run a debug build as follows :

./out/Debug/chrome --user-data-dir=/tmp/chrome0/ file:///usr/local/google/home/leafp/projects/shared/dart-code/WebBindingsTest/web/webbindingstest.html &

When chrome starts up, start dev tools (Ctrl-Shift-J), and click on the observatory tag. On my machine I see some output in the terminal, followed by an aw snap VM crash, with the terminal output and stack trace as shown in the attached text file.  


Attachments: observatory-crasher.tgz (31.43 KB) observatory-crasher-debug-stacktrace.txt (5.67 KB)

johnmccutchan commented 10 years ago

Jacob,

It looks like the allow cross origin requests for Observatory in DevTools was lost in a merge again?

Leaf,

I have an idea about the debug mode crasher and will look at it this morning.

John


cc @jacob314.

johnmccutchan commented 10 years ago

https://codereview.chromium.org/391943003/ should fix the debug mode crash.

johnmccutchan commented 10 years ago

Added this to the 1.6 milestone. Removed Priority-Unassigned label. Added Priority-Critical, Accepted labels.

johnmccutchan commented 10 years ago

Debug mode crash CL has landed.

leafpetersen commented 10 years ago

I don't see the immediate debug crash anymore, thanks, but I still get a debug crash with a couple of more steps (it's possible that this is just a debug manifestation of the release freeze described above though). To reproduce with current TOT dartium (blink r178150, dart r38241):

1) Run dartium as before: ./out/Debug/chrome --user-data-dir=/tmp/chrome0/ file:///usr/local/google/home/leafp/projects/shared/dart-code/WebBindingsTest/web/webbindingstest.html &

2) Open dev tools and open observatory tab

3) Reload the page using the chrome reload button

4) Click the observatory "refresh" button, and wait for the crash.

Stack trace attached.


Attachment: observatory-crasher-debug-stacktrace-2.txt (6.45 KB)

johnmccutchan commented 10 years ago

Hi,

I'm not sure this new crash is directly related to Observatory. I can see from your log that Observatory has successfully sent and received multiple messages:

[2102:2102:0715/095616:INFO:CONSOLE(10958)] "INFO: 2014-07-15 09:56:16.816: Registering new VM callbacks", source: chrome-devtools://devtools/bundled/Observatory/index_devtools.html_bootstrap.dart.js (10958) [2102:2102:0715/095616:INFO:CONSOLE(10958)] "INFO: 2014-07-15 09:56:16.834: Installing page: Instance of 'by'", source: chrome-devtools://devtools/bundled/Observatory/index_devtools.html_bootstrap.dart.js (10958) [2102:2102:0715/095617:INFO:CONSOLE(10958)] "INFO: 2014-07-15 09:56:17.670: New isolate 'isolates/169914991'", source: chrome-devtools://devtools/bundled/Observatory/index_devtools.html_bootstrap.dart.js (10958) [2102:2102:0715/095618:INFO:CONSOLE(10958)] "INFO: 2014-07-15 09:56:18.087: Viewing object of 'VM'", source: chrome-devtools://devtools/bundled/Observatory/index_devtools.html_bootstrap.dart.js (10958)

Clicking the 'Refresh' button just sends another message. The crash then happens inside WebCore. Could you help me understand what's going on there?

johnmccutchan commented 10 years ago

Removed the owner. Added NeedsInfo label.

leafpetersen commented 10 years ago

I did some gdb and printf debugging on this.

The crash is happening because the messageNotifyCallback function in DartController.cpp is being run in a situation where the scriptExecutionContext contains invalid data. It looks to me like something is freeing the scriptExecutionContext prematurely. The m_client field of the destinationContext object in that method contains 0xcdcdcdcdcdcdcdcd, which I think means that the object has been de-allocated. I don't know who is responsible for managing this object, but it seems to come from the VM at some point. Here's what I see in tracing this a bit more.

The scriptExecutionContext for this isolate is set initially from the createServiceIsolateCallback function in DartController.cpp, which creates an isolate using the callback data passed in to it. This function seems to only be used as an argument to the Dart_Initialize callback into the VM, so I'm presuming that it is being called from somewhere in the VM, but I haven't gone back to gdb to trace this.

When createServiceIsolateCallback is called, the callback data is a valid pointer to an ExecutionContext or one of its subclasses. The m_client field of the ExecutionContext is a reasonable looking value at this point. A number of calls to messageNotifyCallback with this scriptExecutionContext are made, and the data in scriptExecutionContext continues to look valid.

However, after the reload of the page, the next call to messageNotifyCallback pulls out the same scriptExecutionContext pointer as before, but now the data in it (m_client) has been overwritten with 0xcdcdcdcdcdcdcdcd, suggesting to me that it has been freed.

I don't really know how all of this is wired together - so I'm not sure what the life-cycle management of these objects is supposed to be. It looks like whoever is managing the callback data with which the serviceIsolate is being created is freeing the callback data prematurely. Does this help? I can do some more tracing in gdb to try to figure out where this object is being freed, but thought that I'd try tossing it back over the wall first to see if this rang any bells. gdb stack trace from the crash attached.


Attachment: observatory-crasher-debug-stacktrace-3.txt (7.36 KB)

leafpetersen commented 10 years ago

So a bit more info. It looks to me like the VM is stashing a single service isolate in service.cc (serviceisolate) which doesn't ever seem to be cleared. This service isolate contains a pointer to the current document. I believe that when the page is reloaded (via the chrome reload button) the old document object will become invalid, but the service isolate keeps holding onto it, and tries to use it to post messages, which results in this crash. It seems like either this isolate should not have an execution context, or else maybe the service isolate should be re-created on page refresh?

johnmccutchan commented 10 years ago

I think there might be a life cycle issue with browser page refreshes. I'm confused because the serviceisolate in service.cc doesn't keep any pointers to the document though.

When the page refreshes is the VM shutdown or is it just the current document that gets cleared?

johnmccutchan commented 10 years ago

I think the code that might be keeping the stale reference might be in core/inspector/InspectorDartAgent.*

Jacob- it would be great to hear from you on this because you wrote that code. Have we missed notifying the dart agent that the page has refreshed?

johnmccutchan commented 10 years ago

Looking into this more, it seems like the service isolate's callback data is holding a reference to the old page's DOM data. We need a signal from Dartium that the service isolate should be recreated - or - a different way of getting at the DartDOMData structure for the service isolate.

johnmccutchan commented 10 years ago

The remaining crash occurs in messageNotifyCallback- when we grab the DartDOMData's ExecutionContext:

ExecutionContext* destinationContext = domData->scriptExecutionContext(); destinationContext->postTask(adoptPtr(new MessageNotifyTask(domData->threadSafeIsolateWrapper())));

After the page reloads the original ExecutionContext (aka Document) has been replaced and we are holding on to a bad pointer.

Vijay,

Do you know how the life cycle works here? I can change the service isolate's execution context but Siva and I can't figure out the correct place to hook in.

Thanks, John


cc @vsmenon. cc @rmacnak-google.

leafpetersen commented 10 years ago

I've been looking at the life cycle code a little bit in the context of dartbug.com/20013, dunno if I know more or less than you do now. The document goes away via

DocumentLoad::createWriterFor ->  LocalFrame::setDOMWindow ->    DartController::clearWindowShell

There looks to be a debugger callback in clearWindowShell that is maybe trying to deal with a similar issue? In any case, that at least might tell you where to hook in to clear out the old context, but not sure where to hook in to get the new context.

jacob314 commented 10 years ago

We need to stop having the observatory isolate be associated with a particular DartDOMData or create a new observatory each time a page is loaded. Making the observatory not be associated with a particular DartDOMData seems like the right solution but I haven't looked into how much work that would be at this point. Presumably if we supported background workers in Dart it would be trivial.

johnmccutchan commented 10 years ago

Thanks Jacob.

The service isolate doesn't need to be attached to any specific DartDOMData. We just need to be able to post a task. Is it possible (from messageNotifyCallback) to query for the current execution context of the page?

Background worker sounds like the correct long term fix.

John

jacob314 commented 10 years ago

There might be more than one page in the render process. We can pick "a" page which should be all you need.

vsmenon commented 10 years ago

John: as you probably figured out, the document and dart isolate are gone after a page refresh. And, there are multiple documents / windows / pages in a process.

What is the lifecycle of the observatory isolate? One per page or one per process? If the latter, do we want to keep it around for the lifetime of the process?

What tasks does the observatory need to post? Does it have to be a document / page on the other side?

johnmccutchan commented 10 years ago

@vsm:

The Observatory isolate lives as long as the VM does. In Dartium it is lazily brought up the first time Observatory is opened.

I need to post a MessageNotifyTask to the Observatory isolate (this is triggered by a call to Dart_Post from InspectorDartAgent (core/inspector)).