[Guide] Integration with existing e2e framework

unlikelyzero commented 2 years ago

Hello! This project looks amazing and is something our team at NASA Open MCT would love to used within our nascent performance testing suite.

I've seen a few requests for integration https://github.com/facebookincubator/memlab/issues/15 and https://github.com/facebookincubator/memlab/issues/14.

We're leveraging the playwright/test package which is nearly identical to jest+puppeteer in terms of capabilities. The only missing API from puppeteer is https://github.com/microsoft/playwright/issues/14134 , but otherwise has API parity and should be interchangeable.

I was wondering if you could provide a guide on how to integrate memlab into our existing playwright/test framework. I could help keep it up to date by integrating these two projects as part of our CI pipeline. I'd just like to know about how we should go about getting started using some of your core APIs. Thanks!

JacksonGL commented 2 years ago

@unlikelyzero Sure, the integration may consists of two steps:

Get heap snapshot dumped onto disk with certain directory structure and meta files

For your existing playwright/test framework, I would suggest implementing an API that takes JS heap snapshots from the browser. It seems playwright supports connecting with the Browser's DevTools via the Chrome DevTools Protocol (link). So you can leverage this to take JS heap snapshots from Chromium. Here is the code pointer of how memlab uses puppeteer to collect the heap snapshots from Chromium.

The API could look like this (as an example):

await takeJSHeapSnapshot(page, tag);

tag is one of the tags: baseline, target, final, these three snapshots is equivalent to SBP, STP, SBP' respectively.

Your takeJSHeapSnapshot API should dump files in certain format on disk. Here is a complete list of files and directory structure required by MemLab core API for finding memory leaks.

/path/to/dump/
└── data
    └── cur
        ├── run-meta.json       # meta data of memlab run and browser configuration
        ├── s1.heapsnapshot     # heap snapshot after the url callback (initial page load)
        ├── s2.heapsnapshot     # heap snapshot after the action callback (after target interaction)
        ├── s3.heapsnapshot     # heap snapshot after the back callback (after reverting target interaction)
        └── snap-seq.json       # meta data about each browser interaction step

To get examples of those meta files, run a random MemLab test scenario and view those files under this directory: $(memlab get-default-work-dir)/data/cur.

In your Playwright test code, you can insert those takeJSHeapSnapshot API calls in your playwright test code to collect the three snapshots and get file dumped onto disk.

Find memory leaks based on heap snapshot dumps and meta files

Once you have the takeJSHeapSnapshot implementation dumped heap snapshots and meta files onto disk, you can find memory leaks with this memlab core API:

const {findLeaks, BrowserInteractionResultReader} = require('@memlab/api');

(async function () {
  const reader = BrowserInteractionResultReader.from('/path/to/dump/');
  const leaks = await findLeaks(reader);
})();

ajayjaggi97 commented 1 year ago

@JacksonGL Can you pls help how to generate the run-meta.json and snap-seq.json files as well. @unlikelyzero if you have already implemented this with playwright, can you pls help?

Also can you explain what is the use of these files.

unlikelyzero commented 1 year ago

@ajayjaggi97 i have not integrated this yet

Tallyb commented 1 year ago

I would love to understand the extra 2 files as well...

JacksonGL commented 1 year ago

Recent updates have made the run-meta.json file optional. Now, to enable a different E2E framework to generate output that can be consumed by the MemLab findLeaks API or the memlab find-leaks command, you simply need a valid snap-seq.json file.

`snap-seq.json` File

The snap-seq.json file encodes the browser interaction data and heap snapshot details that the Memlab leak detector needs to identify memory leaks.

Here's a sample snap-seq.json file. It's an array that contains a series of objects, where each object represents a browser interaction step. To detect memory leaks, we require a minimum of three steps labeled as baseline, target, and final, in that sequence. For a detailed understanding of why Memlab requires three snapshots, please refer to this doc.

[
  {
    "name": "page-load", 
    "snapshot": true,
    "type": "baseline",
    "idx": 1,
    "JSHeapUsedSize": 33872820
  },
  {
    "name": "action-on-page",
    "snapshot": true,
    "type": "target",
    "idx": 2,
    "JSHeapUsedSize": 44172336
  },
  {
    "name": "revert",
    "snapshot": true,
    "type": "final",
    "idx": 3,
    "JSHeapUsedSize": 43304156
  }
]

Now let's take a look at the JSON encoding of a specific step:

  {
    "name": "page-load", 
    "snapshot": true,
    "type": "baseline",
    "idx": 1,
    "JSHeapUsedSize": 33872820
  },

name is a human-readable name for the interaction step, this is mainly for documentation or comment purpose.
"snapshot": true indicates that a heap snapshot has been captured after this E2E interaction step. If this field is false, Memlab will ignore this E2E interaction step when loading and diffing heap snapshots.
type should be one of the following values: baseline, target, or final. Refer to this link for their meanings.
idx denotes the index of the given interaction. Memlab utilizes this index to identify and load the corresponding heap snapshot using the template s${idx}.heapsnapshot when snapshot is true. For instance, in this specific case, Memlab will attempt to locate s1.heapsnapshot in the same directory as the snap-seq.json file, given that idx is 1.
JSHeapUsedSize is an optional field that logs the total heap size in bytes after the completion of this E2E interaction step. If all interaction steps include the JSHeapUsedSize field, Memlab will generate a pixel chart displaying memory usage variations across different steps before listing the detected memory leaks on the terminal.

Optional `run-meta.json` File

The run-meta.json file is an optional JSON file created by Memlab's E2E front-end. It logs metadata related to Memlab's operation, including Chromium startup arguments, information about the web-app under test, and any CLI commands. While this file isn't directly utilized during MemLab's memory leak detection process, it may be valuable later for associating detected memory leaks with the configuration of the tested web-app and browser. For example, sometimes memory leaks only show up in specific configuration such as mobile view. Those information could be useful to display when you build a UI system to display all the memory leaks and their underlying app and browser information.

{
  "app": "default-app-for-scenario",
  "type": "scenario",
  "interaction": "test-google-maps.js",
  "browserInfo": {
    "_browserVersion": "HeadlessChrome/101.0.4950.0",
    "_puppeteerConfig": {
      "headless": true,
      "devtools": true,
      "userDataDir": "/tmp/memlab/data/profile",
      "args": [
        "--no-sandbox",
        "--disable-notifications",
        "--use-fake-ui-for-media-stream",
        "--use-fake-device-for-media-stream",
        "--js-flags=\"--no-move-object-start\"",
        "--enable-precise-memory-info",
        "browser-test",
        "--display=:100"
      ],
      "defaultViewport": {
        "width": 1680,
        "height": 1080,
        "deviceScaleFactor": 1
      }
    },
    "_consoleMessages": [
      "console output line 1",
      "console output line 2",
    ]
  },
  "extraInfo": {
    "command": "run --scenario /home/jacksongl/scripts/test-google-maps.js"
  }
}

All those fields are optional:

app: Specifies the name of the application, in this case, "default-app-for-scenario".
type: Describes the type of test, here it is "scenario".
interaction: The file name of the the E2E test scenario file.
browserInfo: Contains metadata about the browser used for the test:
- _browserVersion: The version of the browser used, for example, HeadlessChrome/101.0.4950.0.
- _puppeteerConfig: The configuration for Puppeteer (MemLab uses Puppeteer as its browser interaction front-end):
- headless: A boolean indicating whether the browser runs in headless mode.
- devtools: A boolean that specifies if devtools are opened when interacting with the page.
- userDataDir: The path to the browser-generated test user profile data directory.
- args: An array of additional arguments to be passed to the browser instance.
- defaultViewport: An object specifying the default viewport's width, height, and device scale factor.
- _consoleMessages: An array of messages outputted in the console during the test run.
extraInfo: Contains additional information regarding the test run:
- command: The command executed to initiate the test run, in this case, memlab run --scenario /home/jacksongl/scripts/test-google-maps.js.

unlikelyzero commented 1 year ago

@JacksonGL we finally were able to start the integration. We have a rough POC based on your advice here:

https://github.com/nasa/openmct/pull/6963

We're seeing the following when executing findLeaks(path)

snapshot meta data invalid or missing

JacksonGL commented 1 year ago

@unlikelyzero

findLeaks takes a ResultReader as input instead of path. Here is a code example:

const {findLeaks, BrowserInteractionResultReader} = require('@memlab/api');

(async function () {
  const reader = BrowserInteractionResultReader.from('/path/to/dump/');
  const leaks = await findLeaks(reader);
})();

Also make sure that the specified directory contains the files mentioned in this previous reply.

If it still shows an error, can you zip the directory and share it with me so I can debug it.

JacksonGL commented 1 year ago

@unlikelyzero The PR needs to change the heap dump directory structure a little bit and add a static snap-seq.json file. I left a comment in your openmct PR.

This should work, let me know how it goes

facebook / memlab