TritonDataCenter / mdb_v8

postmortem debugging for Node.js and other V8-based programs
Mozilla Public License 2.0
240 stars 18 forks source link

would like dcmd for heuristically finding back references #107

Closed davepacheco closed 5 years ago

davepacheco commented 6 years ago

While debugging issues related to memory usage (including memory leaks), it's common to look at a particular object and want to know what other references exist for that object. findjsobjects -r can help with this, though it's extremely time-consuming. (On large core files -- and issues related to memory usage often result in large core files -- each iteration can take tens of minutes, and you often want to make dozens of iterations.) It also doesn't currently find objects referenced via closures -- see #105.

In the meantime, I discovered that I could find back references reasonably reliably using a combination of ugrep and scouring nearby memory. It basically works like this: I'd find an interesting representative object from findjsobjects and look for references:

> 400bc4e08189::findjsobjects -r
400bc4e08189 is not referred to by a known object.

But using ugrep, I'd find that it was referred to elsewhere:

> 400bc4e08189::ugrep
400bc4e08c18

Then I'd try to summarize the region of memory ahead of that reference like this:

> 400bc4e08c18-0t64,0t9=p8+n | ::eval ".=nn; ./p; *.::v8type"

0x400bc4e08bd8: 0x7a9cfc104101  
0x7a9cfc104101: FixedArray

0x400bc4e08be0: 0x4b79a3904121  
0x4b79a3904121: Oddball: "undefined"

0x400bc4e08be8: 0x497ed1e05141  
0x497ed1e05141: Map

0x400bc4e08bf0: 0x900000000     
0x900000000: SMI: value = 9

0x400bc4e08bf8: 0x79785ffef4b9  
0x79785ffef4b9: JSFunction

0x400bc4e08c00: 0x6b8ee5b34791  
0x6b8ee5b34791: FixedArray

0x400bc4e08c08: 0               
0x0: SMI: value = 0

0x400bc4e08c10: 0x4b79a3906b71  
0x4b79a3906b71: JSGlobalObject

0x400bc4e08c18: 0x400bc4e08189  
0x400bc4e08189: JSFunction

Observing that virtually every V8 heap object has as its first word a pointer to a Map, I'd walk back from my pointer to the first address that points to a Map. In this case, that's 400bc4e08be8. To get the heap object's address, we have to take addr|1 -- 400bc4e08be9 in this case. Then I'd try to print that out, and sure enough:

> 0x400bc4e08be9::v8print
400bc4e08be9 FixedArray {
    400bc4e08be9 FixedArrayBase {
        400bc4e08be9 HeapObject < Object  {
            400bc4e08be8 map = 497ed1e05141 (Map)
        }
        400bc4e08bf0 length = 900000000 (SMI: value = 9)
    }
    400bc4e08bf8 data = 79785ffef4b9 (JSFunction)
}

It's a FixedArray, and our original value is one of its elements:

> 0x400bc4e08be9::v8array
79785ffef4b9
6b8ee5b34791
0
4b79a3906b71
400bc4e08189      <------ our original pointer
400bc4e081d1
400bc4e081f1
4b79a3904161
400bc4e08211

I used this approach iteratively to reconstruct a tree that included 4 JSObjects, 1 JSArray, 6 closures, and a handful of FixedArrays that were used as intermediate values (e.g., for arrays or closure contexts). It was tedious and time-consuming, especially when it branched (i.e., there was more than one reference), but it was pretty reliable, and it can be done automatically quite quickly.

I'd propose a couple of dcmds like:

ADDR::jsfindrefs [-s SIZE]
ADDR::v8findrefs [-s SIZE]

which would essentially follow this heuristic:

I'm envisioning that v8findrefs would iterate just once and report the V8 heap objects that immediately refer to the given address, while jsfindrefs would keep following each path until it finds the next JS-level object along each path. Either of these could take an option to repeat the process a number of times.

As for an appropriate default for SIZE: the maximum you'd have to walk back for fixed-size objects is the size of the largest fixed-size object that we know about. Fortunately, we have that information. For strings and arrays, you might have to walk back arbitrarily further than that, since those are arbitrary-sized objects. But since we can always stop when we find the first Map, and false positives can usually be eliminated with the check above, there's no reason not to set this limit fairly high (e.g., even a few hundred KB).

Although this was mostly useful here because we don't yet have #105, and the performance issues associated with findjsobjects -r could also be addressed with the export-to-sqlite path, I think these dcmds would still be useful tools to have.

I will add a detailed analysis for the original problem to MORAY-455, in case that's helpful.

davepacheco commented 6 years ago

I just discovered that mdb has a private function called mdb_whatis_register that allows dmods to plug into the ::whatis dcmd. That may be useful, too.

davepacheco commented 6 years ago

I implemented the behavior I described for ::v8findrefs as a new dcmd called ::v8whatis under #111. I've prototyped the change for ::jsfindrefs -- I've only tried it on a synthetic program, but it seems quite promising. For reference, the branch is here: https://github.com/davepacheco/mdb_v8/tree/dev-jsfindrefs

At this point, it still needs some cleanup, documentation, and tests.

davepacheco commented 6 years ago

I've put up a change for ::jsfindrefs here: https://cr.joyent.us/#/c/3691

Some notes for reviewers:

I have a copy at the the same GitHub branch as before, so you can see the docs for the new dcmd here: https://github.com/davepacheco/mdb_v8/blob/dev-jsfindrefs/docs/usage.md#jsfindrefs

davepacheco commented 6 years ago

PS3 of the above change passes the runtests_node tests:

0.10.48 sunos x86: success
0.10.48 sunos x64: success
0.12.17 sunos x86: success
0.12.17 sunos x64: success
4.8.4   sunos x86: success
4.8.4   sunos x64: success
6.11.2  sunos x86: success
6.11.2  sunos x64: success
davepacheco commented 5 years ago

I've updated the Gerrit CR in response to code review feedback. I'm rerunning the test suite on the same Node versions as before.

davepacheco commented 5 years ago

Here are the test results from PS4:

$ time ./tools/runtests_node run $HOME/node-versions
...
node v6.11.2 ia32
Configuration:
    SRC:                         /home/dap/mdb_v8
    Output directory:            /var/tmp/catest.31582
    Temp directory:              /var/tmp/catest.31582_tmpfiles
    Keep successful test output: false
    Found 8 test(s) to run

===================================================

Executing test test/standalone/tst.arrays.js ... success.
Executing test test/standalone/tst.jsclosure.js ... success.
Executing test test/standalone/tst.jsfindrefs.js ... success.
Executing test test/standalone/tst.postmortem_basic.js ... success.
Executing test test/standalone/tst.postmortem_details.js ... success.
Executing test test/standalone/tst.postmortem_findjsobjects.js ... success.
Executing test test/standalone/tst.postmortem_jsstack.js ... FAILED.
>>> failure details in /var/tmp/catest.31582/failure.0

Executing test test/standalone/tst.v8whatis.js ... success.

===================================================

Results:
    Tests passed:    7/ 8
    Tests failed:    1/ 8

===================================================
Cleaning up output from successful tests ... done.
node v6.11.2 x64
Configuration:
    SRC:                         /home/dap/mdb_v8
    Output directory:            /var/tmp/catest.36172
    Temp directory:              /var/tmp/catest.36172_tmpfiles
    Keep successful test output: false
    Found 8 test(s) to run

===================================================

Executing test test/standalone/tst.arrays.js ... success.
Executing test test/standalone/tst.jsclosure.js ... success.
Executing test test/standalone/tst.jsfindrefs.js ... success.
Executing test test/standalone/tst.postmortem_basic.js ... success.
Executing test test/standalone/tst.postmortem_details.js ... success.
Executing test test/standalone/tst.postmortem_findjsobjects.js ... success.
Executing test test/standalone/tst.postmortem_jsstack.js ... FAILED.
>>> failure details in /var/tmp/catest.36172/failure.0

Executing test test/standalone/tst.v8whatis.js ... success.

===================================================

Results:
    Tests passed:    7/ 8
    Tests failed:    1/ 8

===================================================
Cleaning up output from successful tests ... done.

Summary:
0.10.48 sunos x86: success
0.10.48 sunos x64: success
0.12.17 sunos x86: success
0.12.17 sunos x64: success
4.8.4   sunos x86: success
4.8.4   sunos x64: success
6.11.2  sunos x86: fail
6.11.2  sunos x64: fail

real    41m29.672s
user    14m48.500s
sys 26m45.323s

More specifically, across all Node versions, there were two test failures. They both appear to be instances of #38:

dap@f0c3d2d4 mdb_v8 $ cat /var/tmp/catest.31582/failure.0/README 
/home/dap/mdb_v8/test/standalone/tst.postmortem_jsstack.js failed: test returned 1
dap@f0c3d2d4 mdb_v8 $ cat /var/tmp/catest.31582/failure.0/*.err

assert.js:81
  throw new assert.AssertionError({
  ^
AssertionError: unexpected frame where stalloogle was expected; core retained as /var/tmp/node.35574
    at ChildProcess.<anonymous> (/home/dap/mdb_v8/test/standalone/tst.postmortem_jsstack.js:140:11)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)
dap@f0c3d2d4 mdb_v8 $
dap@f0c3d2d4 mdb_v8 $ cat /var/tmp/catest.36172/failure.0/README 
/home/dap/mdb_v8/test/standalone/tst.postmortem_jsstack.js failed: test returned 1
dap@f0c3d2d4 mdb_v8 $ cat /var/tmp/catest.36172/failure.0/*.err

assert.js:81
  throw new assert.AssertionError({
  ^
AssertionError: unexpected frame where stalloogle was expected; core retained as /var/tmp/node.40373
    at ChildProcess.<anonymous> (/home/dap/mdb_v8/test/standalone/tst.postmortem_jsstack.js:140:11)
    at emitTwo (events.js:106:13)
    at ChildProcess.emit (events.js:191:7)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:215:12)

(This also looks like #97, but this PS has the fix for that issue.)

davepacheco commented 5 years ago

As described in #38, this issue does seem intermittent:

dap@f0c3d2d4 mdb_v8 $ node -v
v6.11.2
(reverse-i-search)`': ^C
dap@f0c3d2d4 mdb_v8 $ make -n test
tools/catest -a
dap@f0c3d2d4 mdb_v8 $ ./tools/catest test/standalone/tst.postmortem_jsstack.js 
Configuration:
    SRC:                         /home/dap/mdb_v8
    Output directory:            /var/tmp/catest.73874
    Temp directory:              /var/tmp/catest.73874_tmpfiles
    Keep successful test output: false
    Found 1 test(s) to run

===================================================

Executing test test/standalone/tst.postmortem_jsstack.js ... FAILED.
>>> failure details in /var/tmp/catest.73874/failure.0

===================================================

Results:
    Tests passed:    0/ 1
    Tests failed:    1/ 1

===================================================
Cleaning up output from successful tests ... done.
dap@f0c3d2d4 mdb_v8 $ ./tools/catest test/standalone/tst.postmortem_jsstack.js 
Configuration:
    SRC:                         /home/dap/mdb_v8
    Output directory:            /var/tmp/catest.75980
    Temp directory:              /var/tmp/catest.75980_tmpfiles
    Keep successful test output: false
    Found 1 test(s) to run

===================================================

Executing test test/standalone/tst.postmortem_jsstack.js ... success.

===================================================

Results:
    Tests passed:    1/ 1
    Tests failed:    0/ 1

===================================================
Cleaning up output from successful tests ... done.
dap@f0c3d2d4 mdb_v8 $ 

I think this can count as a clean test run. Obviously, we should still fix #38, but I don't think this work should be blocked on that.

davepacheco commented 5 years ago

Integrated. Thanks @melloc for the review!