facebook / memlab

A framework for finding JavaScript memory leaks and analyzing heap snapshots
https://facebook.github.io/memlab/
MIT License
4.35k stars 118 forks source link

[Guide] Integration with existing e2e framework #35

Open unlikelyzero opened 1 year ago

unlikelyzero commented 1 year ago

Hello! This project looks amazing and is something our team at NASA Open MCT would love to used within our nascent performance testing suite.

I've seen a few requests for integration https://github.com/facebookincubator/memlab/issues/15 and https://github.com/facebookincubator/memlab/issues/14.

We're leveraging the playwright/test package which is nearly identical to jest+puppeteer in terms of capabilities. The only missing API from puppeteer is https://github.com/microsoft/playwright/issues/14134 , but otherwise has API parity and should be interchangeable.

I was wondering if you could provide a guide on how to integrate memlab into our existing playwright/test framework. I could help keep it up to date by integrating these two projects as part of our CI pipeline. I'd just like to know about how we should go about getting started using some of your core APIs. Thanks!

JacksonGL commented 1 year ago

@unlikelyzero Sure, the integration may consists of two steps:

Get heap snapshot dumped onto disk with certain directory structure and meta files

For your existing playwright/test framework, I would suggest implementing an API that takes JS heap snapshots from the browser. It seems playwright supports connecting with the Browser's DevTools via the Chrome DevTools Protocol (link). So you can leverage this to take JS heap snapshots from Chromium. Here is the code pointer of how memlab uses puppeteer to collect the heap snapshots from Chromium.

The API could look like this (as an example):

await takeJSHeapSnapshot(page, tag);

tag is one of the tags: baseline, target, final, these three snapshots is equivalent to SBP, STP, SBP' respectively.

Your takeJSHeapSnapshot API should dump files in certain format on disk. Here is a complete list of files and directory structure required by MemLab core API for finding memory leaks.

/path/to/dump/
└── data
    └── cur
        ├── run-meta.json       # meta data of memlab run and browser configuration
        ├── s1.heapsnapshot     # heap snapshot after the url callback (initial page load)
        ├── s2.heapsnapshot     # heap snapshot after the action callback (after target interaction)
        ├── s3.heapsnapshot     # heap snapshot after the back callback (after reverting target interaction)
        └── snap-seq.json       # meta data about each browser interaction step

To get examples of those meta files, run a random MemLab test scenario and view those files under this directory: $(memlab get-default-work-dir)/data/cur.

In your Playwright test code, you can insert those takeJSHeapSnapshot API calls in your playwright test code to collect the three snapshots and get file dumped onto disk.

Find memory leaks based on heap snapshot dumps and meta files

Once you have the takeJSHeapSnapshot implementation dumped heap snapshots and meta files onto disk, you can find memory leaks with this memlab core API:

const {findLeaks, BrowserInteractionResultReader} = require('@memlab/api');

(async function () {
  const reader = BrowserInteractionResultReader.from('/path/to/dump/');
  const leaks = await findLeaks(reader);
})();
ajayjaggi97 commented 1 year ago

@JacksonGL Can you pls help how to generate the run-meta.json and snap-seq.json files as well. @unlikelyzero if you have already implemented this with playwright, can you pls help?

Also can you explain what is the use of these files.

unlikelyzero commented 1 year ago

@ajayjaggi97 i have not integrated this yet

Tallyb commented 1 year ago

I would love to understand the extra 2 files as well...

JacksonGL commented 1 year ago

Recent updates have made the run-meta.json file optional. Now, to enable a different E2E framework to generate output that can be consumed by the MemLab findLeaks API or the memlab find-leaks command, you simply need a valid snap-seq.json file.

snap-seq.json File

The snap-seq.json file encodes the browser interaction data and heap snapshot details that the Memlab leak detector needs to identify memory leaks.

Here's a sample snap-seq.json file. It's an array that contains a series of objects, where each object represents a browser interaction step. To detect memory leaks, we require a minimum of three steps labeled as baseline, target, and final, in that sequence. For a detailed understanding of why Memlab requires three snapshots, please refer to this doc.

[
  {
    "name": "page-load", 
    "snapshot": true,
    "type": "baseline",
    "idx": 1,
    "JSHeapUsedSize": 33872820
  },
  {
    "name": "action-on-page",
    "snapshot": true,
    "type": "target",
    "idx": 2,
    "JSHeapUsedSize": 44172336
  },
  {
    "name": "revert",
    "snapshot": true,
    "type": "final",
    "idx": 3,
    "JSHeapUsedSize": 43304156
  }
]

Now let's take a look at the JSON encoding of a specific step:

  {
    "name": "page-load", 
    "snapshot": true,
    "type": "baseline",
    "idx": 1,
    "JSHeapUsedSize": 33872820
  },

Optional run-meta.json File

The run-meta.json file is an optional JSON file created by Memlab's E2E front-end. It logs metadata related to Memlab's operation, including Chromium startup arguments, information about the web-app under test, and any CLI commands. While this file isn't directly utilized during MemLab's memory leak detection process, it may be valuable later for associating detected memory leaks with the configuration of the tested web-app and browser. For example, sometimes memory leaks only show up in specific configuration such as mobile view. Those information could be useful to display when you build a UI system to display all the memory leaks and their underlying app and browser information.

{
  "app": "default-app-for-scenario",
  "type": "scenario",
  "interaction": "test-google-maps.js",
  "browserInfo": {
    "_browserVersion": "HeadlessChrome/101.0.4950.0",
    "_puppeteerConfig": {
      "headless": true,
      "devtools": true,
      "userDataDir": "/tmp/memlab/data/profile",
      "args": [
        "--no-sandbox",
        "--disable-notifications",
        "--use-fake-ui-for-media-stream",
        "--use-fake-device-for-media-stream",
        "--js-flags=\"--no-move-object-start\"",
        "--enable-precise-memory-info",
        "browser-test",
        "--display=:100"
      ],
      "defaultViewport": {
        "width": 1680,
        "height": 1080,
        "deviceScaleFactor": 1
      }
    },
    "_consoleMessages": [
      "console output line 1",
      "console output line 2",
    ]
  },
  "extraInfo": {
    "command": "run --scenario /home/jacksongl/scripts/test-google-maps.js"
  }
}

All those fields are optional:

unlikelyzero commented 1 year ago

@JacksonGL we finally were able to start the integration. We have a rough POC based on your advice here:

https://github.com/nasa/openmct/pull/6963

We're seeing the following when executing findLeaks(path)

snapshot meta data invalid or missing

JacksonGL commented 1 year ago

@unlikelyzero

findLeaks takes a ResultReader as input instead of path. Here is a code example:

const {findLeaks, BrowserInteractionResultReader} = require('@memlab/api');

(async function () {
  const reader = BrowserInteractionResultReader.from('/path/to/dump/');
  const leaks = await findLeaks(reader);
})();

Also make sure that the specified directory contains the files mentioned in this previous reply.

If it still shows an error, can you zip the directory and share it with me so I can debug it.

JacksonGL commented 1 year ago

@unlikelyzero The PR needs to change the heap dump directory structure a little bit and add a static snap-seq.json file. I left a comment in your openmct PR.

This should work, let me know how it goes