MetaMask / metamask-extension

:globe_with_meridians: :electric_plug: The MetaMask browser extension enables browsing Ethereum blockchain enabled websites
https://metamask.io
Other
12.04k stars 4.92k forks source link

Investigate Mocking Server not terminating properly #15371

Open seaona opened 2 years ago

seaona commented 2 years ago

When we initiate a Mocking Server on our e2e, sometimes it is not terminated properly and stays hanging for a couple of minutes. A workaround has been put in place, but we should investigate the root cause and fix it. Might be related to our npm dependency mockttp

seaona commented 2 years ago

I've been researching a bit on how to proper terminate mockttp servers and found the following:

HTTP/2 support is a bit experimental in places, and I can definitely believe that there’s some cases where pooled connections aren’t shut down as aggressively as intended. (..)

https://lightrun.com/answers/httptoolkit-mockttp-how-to-gracefully-shutdown-the-server-tests-are-failing

Setting { http2: false } seems like it worked for some people.

So on our helpers.js file, in line 47, we could try to include this:

const mockServer = mockttp.getLocal({ https, cors: true, http2: false });

Even though, I am not sure how to repro the server not terminating, to actually check that this solves it and which other implications has to disable HTTP2. @PeterYinusa you mentioned we have a workaround in place to solve this. Could you point me which part of the code was, so I can disable it and test the above suggestion, to see if this fixes completely the problem? TY!!

PeterYinusa commented 2 years ago

Context:

A few months back I noticed that sometimes the tests would hang for a few minutes after all the tests are complete. You can see this below.

Before and After

Before

Test finishes after ~40s Process exits after ~5m

After

Test finishes after ~40s Process exits after ~40s

test times

Test runner

With the old run command, the process does not exit until ~5min mark, even though the tests finish in under a min.

$ node test/e2e/run-e2e-test.js test/e2e/tests/phishing-detection.spec.js --browser chrome
$ .../mocha --no-config --no-timeouts test/e2e/tests/phishing-detection.spec.js

Investigation - Handles

I then updated test/e2e/helpers.js, adding some code process._getActiveHandles() to the end of the test to check for open handles. From a look at the open handle I came to the conclusion its likely mockttp as I saw references to Server, Http2Server, TLSSocket, ClientRequest etc.

Investigation - Potential fix?

After some investigation I stumbled across this issue https://github.com/httptoolkit/mockttp/issues/83, and attempted the approach in the comment https://github.com/httptoolkit/mockttp/issues/83#issuecomment-1128004639 similar to @seaona comment above but that did not resolve the issue.

I also tried upgraded mockttp to a newer version, not the latest at the time as I believe it had breaking changes, but that also didn't work. I then implemented the quick fix below and moved on 😆

Test runner - quick fix

I then updated test/e2e/run-e2e-test.js, forcing mocha to exit using the following command -exit as a quick fix.

With the new run command, the process exits immediately, but the underlying issues is not resolved.

$ node test/e2e/run-e2e-test.js test/e2e/tests/phishing-detection.spec.js --browser chrome
$ .../mocha --no-config --timeout 60000 test/e2e/tests/phishing-detection.spec.js --exit
PeterYinusa commented 2 years ago

There was one other area I looked into, implementing a beforeResponse callback in the thenPassThrough method specified in test/e2e/mock-e2e.js. This seemed to also resolve the issue, but I couldn't figure out exactly why as this method is optional.

    beforeResponse: (_) => {
      return {};
    },
pimterry commented 2 years ago

Hey metamask team - I'm the maintainer of Mockttp, I just found this issue :smile:.

I can't be sure from a quick skim what the cause is, but note that for https://github.com/httptoolkit/mockttp/issues/83 the reported problems are very HTTP/2 specific, and disabling that does fix everything there, with minimal downside (technically the tests are less representative of real traffic, which is normally HTTP/2, but the semantics should all be identical). The only remaining non-HTTP/2 case discussed at the end is a Jest bug (https://github.com/facebook/jest/issues/11665) which was fixed in the latest Jest release.

If you have a small standalone repro for issues like this with either HTTP/2 or HTTP/1 though, I would really love to see those, that'd be super useful! If you can open a Mockttp issue with anything like that then I'm happy to fix it for you. It's useful to know the specific version of Node you're using here, since these behaviours vary significantly between versions.

Unfortunately Node's HTTP/2 module is still a bit rough around the edges with this kind of shutdown handling (in most non-testing cases, people rarely cleanly shutdown active servers - they run forever till the process is actively stopped) so it's whack-a-mole hunting down race conditions here, and more test cases like this are super helpful for that :+1: