TestCafe hangs on loading page and freezes execution when executing concurrently

vlads11 commented 2 years ago

What is your Scenario?

Concurrently run 5 fixtures that execute a single test. Visiting https://uk.zwift.com results in TC hanging indefinitely and preventing the remainder of test execution.

What is the Current behavior?

TestCafe hangs on page load

What is the Expected behavior?

Pages are loaded concurrently as expected

What is your public website URL? (or attach your complete example)

https://github.com/vlads11/TC-PageLoadError

1) npm install 2) testcafe chrome ./Tests/* -c 5 --skip-js-errors

This may need to be triggered a few times but it consistently reproduces

What is your TestCafe test code?

Test Code provided https://github.com/vlads11/TC-PageLoadError ----- this single fixture is duplicated 5 times to reproduce the error in question.

import { Selector } from "testcafe"; import { test } from 'testcafe';

let homeURL = 'https://uk.zwift.com';

//slice0 let slice0 = Selector('[class=\'image-with-text-overlay__banner columns one-whole image-crop-none\']')

fixture Shopify HowZwiftWorkspage .page('about:blank') .beforeEach(async t => { await t.navigateTo(homeURL); })

test('Slice0 Get Started Button goes to Create Account page', async t => { await t.click(slice0); })

Your complete configuration file

default

Your complete test report

No response

Screenshots

Recording of issue can be seen here: https://github.com/vlads11/TC-PageLoadError/blob/main/ScreenRecording.mov

Steps to Reproduce

Navigate to https://uk.zwift.com with a concurrent session count of 3 or higher
Notice page hang and never completes loading.

TestCafe version

1.18.6

Node.js version

v16.14.2

Command-line arguments

testcafe chrome ./Tests/* -c 5 --skip-js-errors

Browser name(s) and version(s)

Chrome 101

Platform(s) and version(s)

macOS 12.3.1 but also happens in Linux

Other

No response

miherlosev commented 2 years ago

Hi @vlads11

TestCafe hangs on page load

I cannot reproduce the described behavior. I got the The specified selector doesn't match any element in DOM tree error.

Note that the tested page loaded long enough. You can try to use the cache option to speed up test execution.

If are you sure that the issue exists, please record a video illustrating it.

vlads11 commented 2 years ago

@miherlosev - I've updated the test to pass, however the failure was irrelevant to what was going on. If you pull down the latest from: https://github.com/vlads11/TC-PageLoadError

npm install
testcafe chrome ./Tests/* -c 5 --skip-js-errors

This happens on a Mac and Linux but should also reproduce on Windows. May need to trigger it a few times, but within 5 tries it should reproduce the issue at least once. I've also recorded a video and it's attached to the repo.

https://github.com/vlads11/TC-PageLoadError/blob/main/Screen_Rec_2.mov https://github.com/vlads11/TC-PageLoadError/blob/main/ScreenRecording.mov

Essentially, on the first try, all tests may work and pass with 5 concurrent threads but if you keep triggering the same job at one point it will fail because the page will fail to load. I've tried adding cache option and it does reduce the frequency of the error but within 5 tries I still get to a state where TestCafe opens a web page and hand indefinitely.

vlads11 commented 2 years ago

If you need a video reproducing the issue in another format let me know.

vlads11 commented 2 years ago

@miherlosev

Was able to reproduce with the following code as well in a single thread.

import { Selector } from "testcafe"; import { test } from 'testcafe';

let homeURL = 'https://uk.zwift.com'; let primaryWhyZwiftLink = Selector('[class*=\'navbar-link header__link\']').withAttribute('href','/pages/why-zwift');

fixture Shopify HowZwiftWorkspage .page('about:blank') .beforeEach(async t => { await t.navigateTo(homeURL); })

for (let i = 0; i < 20; i++) { test('Slice0 Get Started Button goes to Create Account page', async t => { await t.click(primaryWhyZwiftLink); }) } Version: "testcafe": "1.18.6" At a certain point, the page fails to load and TestCafe freezes indefintely. Only way to abort is to physically cancel the run.

Hope this helps, but let me know if there is anything else I can provide to help investitage

vlads11 commented 2 years ago

@miherlosev - I got at the root of the issue via network tracing. One of the JS requests is stuck in a pending status and prevents TC from interacting with the page as some resources did not finish loading. I think for now this issue can be closed as its not a TC issue. Truly thank you and the team for your time.

miherlosev commented 2 years ago

Hi @vlads11

Thank you for the shared example. I've reproduced the issue.

For the team: it's an eval processing issue.

vlads11 commented 2 years ago

Hi TC team - wanted to see if there has been any traction on this issue. I know you can't provide any ETA but curious if this is on your radar to be fixed in the near future or if it's going to get punted for a while. TY, appreciate all of you!

Aleksey28 commented 2 years ago

Hi @vlads11,

We don't have any results yet and we can't give you any time estimates. We are fixing issues according to our queue. Right now, we are not working on this issue. We will update this thread once we have any news.

Akelator commented 2 years ago

if it helps, I can say that, in my case, this problem only happens in Chromium browsers and when running the tests in a different domain to the one the threads are pointing to. if the same domain, the test does not freeze after 5 concurrent threads

miherlosev commented 2 years ago

Hi @vlads11

Thank you for the additional information.

ghost commented 1 year ago

What is the status on this? We might be experiencing similar issue with version 1.19.0.

Aleksey28 commented 1 year ago

Hi @christofer-ja,

We don't have any results yet. We will update this thread once we have any news.

codambro commented 1 year ago

Is there any workaround for this? Perhaps forcing a timeout and aborting the test? I attempted to timeout the hung promise (via Promise.race), but the TestController t still has the hung task in it's queue and there is no way to clear it. So any future t actions will still be blocked waiting on the hung task.

Aleksey28 commented 1 year ago

Hi @codambro,

There is no workaround, and it is not fixed yet. Perhaps, you see another error - please share your simple sample so we can make sure that the error you observe has the same cause.

codambro commented 1 year ago

My only simple example would be the one provided by @vlads11 already. Except with testcafe v2.0.0

Aleksey28 commented 1 year ago

I see. We don't have a workaround for this case yet; you can only decrease concurrency.

zeusdeux commented 1 year ago

Hey @Aleksey28! Do we know what the root cause might be or where it might lie in testcafe codebase?

Aleksey28 commented 1 year ago

Hi @zeusdeux

The problem is in the eval processing (see the above comment). We need to research it further, which will take some time. However, we'll be able to start our research only after we finish working on our current tasks. We appreciate your patience.

We'll update this thread once we have any news.

zeusdeux commented 1 year ago

Thanks for the update and the work @Aleksey28! 🙏🏽

codambro commented 1 year ago

I was able to still have this occur with proxyless enabled

AlexKamaev commented 1 year ago

@codambro Thank you. We will double-check the issue in proxyless mode when the issue is fixed.

johnny1K commented 1 year ago

Hi - we are seeing similar problem on firefox and without concurrency, too.

alienintheheights commented 1 year ago

Also running into this. Our tests consistently hang, though the point at which they do varies (we have around 300 tests that run nightly). No concurrency. Running latest, 2.6.1, on node 16.13.2.

When it gets stuck, strace on the /node_modules/.bin/testcafe process shows this over and over:

 strace -f -tt -s 200 -p 6490
strace: Process 6490 attached with 7 threads
[pid  6496] 14:20:07.310527 futex(0x65b99ec, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  6495] 14:20:07.310672 futex(0x64c089c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid  6494] 14:20:07.310705 futex(0x64c089c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid  6493] 14:20:07.310734 futex(0x64c089c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid  6492] 14:20:07.310762 futex(0x64c089c, FUTEX_WAIT_PRIVATE, 4, NULL <unfinished ...>
[pid  6491] 14:20:07.310789 epoll_wait(9,  <unfinished ...>
[pid  6490] 14:20:07.358401 epoll_wait(13,

Those pids correspond to defunct instances of Google chrome. New ones have been spawned in its place.

www   6542     1  0 13:09 ?        00:00:00 /opt/google/chrome/chrome_crashpad_handler --monitor-self-annotation=ptype=crashpad-handle
www   6547  6535  0 13:09 pts/0    00:00:00 /opt/google/chrome/chrome --type=zygote --no-zygote-sandbox --headless --headless --crashp
www  6549  6535  0 13:09 pts/0    00:00:00 /opt/google/chrome/chrome --type=zygote --headless --headless --crashpad-handler-pid=6542
www   6552  6549  0 13:09 pts/0    00:00:00 /opt/google/chrome/chrome --type=zygote --headless --headless --crashpad-handler-pid=6542
www   6570  6535  2 13:09 pts/0    00:02:09 /opt/google/chrome/chrome --type=utility --utility-sub-type=network.mojom.NetworkService -
www   6573  6552 49 13:09 pts/0    00:37:17 /opt/google/chrome/chrome --type=renderer --headless --crashpad-handler-pid=6542 --first-r
www   6606  6547  2 13:09 pts/0    00:01:30 /opt/google/chrome/chrome --type=gpu-process --headless --ozone-platform=headless --use-an
www   6607  6606  0 13:09 pts/0    00:00:00 /opt/google/chrome/chrome --type=broker

Meanwhile the node_modules/testcafe/lib/cli process is looping the following POST to /messaging

[pid  6497] 14:21:33.576119 epoll_wait(14, [], 1024, 0) = 0
[pid  6497] 14:21:33.576213 epoll_wait(14, [], 1024, 425) = 0
[pid  6497] 14:21:34.001847 epoll_wait(14, [], 1024, 0) = 0
[pid  6497] 14:21:34.001935 epoll_wait(14, [{EPOLLIN, {u32=27, u64=27}}], 1024, 1976) = 1
[pid  6497] 14:21:34.084023 read(27, "POST /messaging HTTP/1.1\r\nHost: 192.168.2.100:41539\r\nConnection: keep-alive\r\nContent-Length: 230\r\ncache-control: no-cache, no-store, must-revalidate\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleW"..., 65536) = 688
[pid  6497] 14:21:34.085589 writev(27, [{iov_base="HTTP/1.1 200 OK\r\ncontent-type: application/json\r\ncache-control: no-cache, no-store, must-revalidate\r\npragma: no-cache\r\nDate: Thu, 01 Jun 2023 19:21:34 GMT\r\nConnection: keep-alive\r\nKeep-Alive: timeout="..., iov_len=1511}, {iov_base="", iov_len=0}], 2) = 1511

Don't know if that helps. We had the same outcome on older versions too.

This is our .testcaferc.json

{

    "browsers": ["chrome:headless"],
    "src": ["test/*-test.js"],

    "speed": 1,
    "selectorTimeout": 30000,
    "assertionTimeout": 30000,
    "pageLoadTimeout": 60000,
    "ajaxRequestTimeout": 300000,
    "testExecutionTimeout": 300000,
    "nativeAutomation": false,

    "quarantineMode": false,
    "debugMode": false,
    "debugOnFail": false,
    "stopOnFirstFail": false,
    "skipJsErrors": true,
    "skipUncaughtErrors": true,
    "disablePageCaching": true,
    "developmentMode": false,

    "reporter": [
        {
            "name": "spec"
        },
        {
            "name": "xunit",
            "output": "test/output/reports/report.xml"
        },
        {
            "name": "html",
            "output": "test/output/reports/report.html" 
        }
    ],

    "screenshots": {
        "path": "test/output/screenshots",
        "takeOnFails": true,
        "pathPattern": "${DATE}_${TIME}/${FIXTURE}/test-${TEST_INDEX}/${FILE_INDEX}.png"
    }

}

johnny1K commented 1 year ago

To provide an update: We tried a number of things and finally had some progress by running tests via testcafe runner in groups. We had the same situation as @alienintheheights mentioned - 200+ tests and it was hanging in random places without any logs. We split the tests in two almost equal size groups and they still hang from time to time but not so often - say 1-2 in 20, while before it was 9 hangs from 10 runs. It's not ideal but at least it doesn't fully block our work. We are yet to experiment splitting the tests in more groups. Hope this helps.

alienintheheights commented 1 year ago

@johnny1K we are also working on batching these tests. Like you, it lessens but does not eliminate the issue. FWIW we get it on Chrome and FF, specifically Google Chrome 113.0.5672.126 and Mozilla Firefox 102.7.0esr. It seems to happen on both Windows and ubuntu too. I tried the testcafe upgrades (from 1.8.0 to current) to no avail.

Klippdocka commented 1 year ago

How is the eval bug going ?

github-actions[bot] commented 1 year ago

No updates yet. Once we get any results, we will post them in this thread.

cattermo commented 1 year ago

This sounds quite similar to this bug https://github.com/DevExpress/testcafe/issues/7097 We "solved" it by running the tests in node 14. Still running node v14.18.3 where we have a lot less of the "hanging" of requests.

Hope the Testcafe team can find the real issue this time 🙏

AlexKamaev commented 10 months ago

Hello, all. I've tried to reproduce the issue with the modified example from the first post:

import { Selector } from "testcafe";

let homeURL = 'https://uk.zwift.com';
let primaryWhyZwiftLink = Selector('[data-testid="Why Zwift-nav-link"]');

fixture `Shopify HowZwiftWorkspage`
    .page('about:blank')
    .beforeEach(async t => {
        await t.navigateTo(homeURL);
    });

for (let i = 0; i < 20; i++) {
    test('Slice0 Get Started Button goes to Create Account page', async t => {
        await t.click('#truste-consent-button');
        await t.click(primaryWhyZwiftLink);
    });
}

I did not manage to reproduce the issue in the latest TestCafe version (v3.4.0) under Node 18.18.2, with and without Native Automation enabled. If anyone can share an example that demonstrates the issue, please share it here.

github-actions[bot] commented 9 months ago

This issue was automatically closed because there was no response to our request for more information from the original author. Currently, we don't have enough information to take action. Please reach out to us if you find the necessary information and are able to share it. We are also eager to know if you resolved the issue on your own and can share your findings with everyone.

DevExpress / testcafe