cockpit-project / cockpit

Cockpit is a web-based graphical interface for servers.
http://www.cockpit-project.org/
GNU Lesser General Public License v2.1
11.2k stars 1.11k forks source link

shell: Use the "load" event instead of polling to catch ready frames #21061

Closed mvollmer closed 2 weeks ago

mvollmer commented 1 month ago
mvollmer commented 3 weeks ago

The two testFrameReload tests are suspicious: I don't know what requirement or feature they are supposed to test. They still pass, but maybe only by accident.

What they do:

1) modify the DOM directly to remove the "data-ready" attribute of a iframe 2) modify the DOM to change the "src" attrr of the iframe 3) wait for the "data-ready" attribute to come back

Step 2) might simulate the code inside the iframe changing its own location, but that is not something Cockpit supports beyond changing the hash. Both the old and new Shell will just react to the "load" event on the iframe, run their hash tracking stuff, and then put the correct "src" attribute back. The old Shell would run its polling loop, but the new Shell just brings the DOM into sync with its own reality.

So, without knowing what testFrameReload is actually testing, I don't know whether the new Shell broke something.

martinpitt commented 3 weeks ago

FTR, I'm afraid I don't know what that data-ready hackery in the test should achieve. It goes way back to 2015:

It seems to me that we should test this using the "Active pages" killer in the session menu (with Alt) and then loading it back from the menu -- otherwise it's just internal hackery and meaningless IMHO.

mvollmer commented 3 weeks ago

It seems to me that we should test this using the "Active pages" killer in the session menu (with Alt) and then loading it back from the menu -- otherwise it's just internal hackery and meaningless IMHO.

Hmm, there is also the "Reload this frame" action in the browser menu. I wonder if we survive that.

https://github.com/cockpit-project/cockpit/commit/37aedec68f7b50cd13435920334c3c4ba6cef59b is worried about the router, and that a page needs to be unregistered when it is reloaded. (so that the channels open for the previous incarnation of the frame are closed,) I think this happens now via the "unload" event that the router listens to. I'll experiment a bit with "Reload this frame" as well.

mvollmer commented 3 weeks ago

I'll experiment a bit with "Reload this frame" as well.

Alright, reloading the frame would correctly unregister it from the router, but when it was loaded, we didn't do the necessary setup, like attaching the event handler for the idle timeout.

So let's attach our own permanent "load" and "unload" handlers and do the setup on every load. There is now also a teardown that unsets the ready flag, which helps with avoiding flicker in dark mode.

The testFrameReload tests should not touch the data-ready attribute, I'd say, but should instead check that open channels are closed properly. It would be best to reload the frame explicitly instead of messing with the "src" attribute. I would guess bidi can do that, right?

martinpitt commented 3 weeks ago

@mvollmer: Yes, bidi has a reload() API, we already use it in testlib.py's Browser.reload() -- except that this always does self.switch_to_top() first, i.e. it always reloads the entire page. You could just add an option to it to not do that, and then just have it reload the current frame.

mvollmer commented 3 weeks ago

You could just add an option to it to not do that, and then just have it reload the current frame.

I tried, but it seems to reload everything always. Also, I got testlib.Error: unknown error: navigation canceled. Kicking the "src" attribute is not wrong, according to StackOverflow, so...

mvollmer commented 3 weeks ago

Let's get this in as a followup to #21012.

mvollmer commented 2 weeks ago

Step 2) might simulate the code inside the iframe changing its own location, but that is not something Cockpit supports beyond changing the hash.

Just to clarify: step 2 is very likely only supposed to reload the frame without changing the url, not simulate loading a different url inside the frame.

mvollmer commented 2 weeks ago

TestPages.testBasic and TestPages.testHistory are flaking pretty consistently on Firefox now. I'll check this out more.

mvollmer commented 2 weeks ago

TestPages.testBasic and TestPages.testHistory are flaking pretty consistently on Firefox now. I'll check this out more.

Also testMenuSearch. I hope we have only uncovered existing races in the tests.

mvollmer commented 2 weeks ago

TestPages.testHistory likely runs into https://bugzilla.mozilla.org/show_bug.cgi?id=941146.

This is the beginning of the backtrace, our terminal is using a canvas to render.

    _fillCharTrueColor http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    _renderBlockCursor http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    _render http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    value http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    _animationFrame http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    restartBlinkAnimation http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    handleGridChanged http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    renderRows http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:8
    _renderRows http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    _renderDebouncer http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    _innerRefresh http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    _animationFrame http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    refresh http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    refreshRows http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    refresh http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    V http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:12
    fire http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:14
    parse http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:14
    _writeBuffer http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:14
    _innerWrite http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:15
    write http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:15
    setTimeout handler*write http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:15
    write http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:14
    write http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:15
    onChannelMessage http://127.0.0.2:9091/cockpit/@localhost/system/terminal.js:16
    Ie http://127.0.0.2:9091/cockpit/@localhost/base1/cockpit.js:1
mvollmer commented 2 weeks ago

TestPages.testHistory likely runs into https://bugzilla.mozilla.org/show_bug.cgi?id=941146.

-> we need a naughty for this.

mvollmer commented 2 weeks ago

The workaround for TestPages.testBasic is successfull: https://cockpit-logs.us-east-1.linodeobjects.com/pull-21061-c74ae9c4-20241016-071913-fedora-40-firefox-expensive/log.html

There was likely something racy about reloading the page (for the language change) and navigating at about the same time.

martinpitt commented 2 weeks ago

TestPages.testHistory likely runs into https://bugzilla.mozilla.org/show_bug.cgi?id=941146.

-> we need a naughty for this.

Perhaps just skip the test (or that part of the test) on Firefox?

mvollmer commented 2 weeks ago

Same fix as for testBasic also works for testMenuSearch: https://cockpit-logs.us-east-1.linodeobjects.com/pull-21061-0c1cea44-20241016-075719-fedora-40-firefox-expensive/log.html

mvollmer commented 2 weeks ago

Perhaps just skip the test (or that part of the test) on Firefox?

Nah, it's a good test for tricky business (getting browser history right) and we really should run it on Firefox. It does reach the end successfully. There is just the occasional oops. We could allow oopses, but that would be too broad.

We could invent machinery that ala allow_journal_messages, that allows specific oopses, but this here really is a bug in Firefox that might conceivably be fixed some day.

mvollmer commented 2 weeks ago

Amplified testHistory run: https://cockpit-logs.us-east-1.linodeobjects.com/pull-21061-6ffdfcd3-20241016-082854-fedora-40-firefox-expensive/log.html

martinpitt commented 2 weeks ago

The naughty landed -- so a retry of the amplified run should go green now, with the naughty hitting?

mvollmer commented 2 weeks ago
  • [ ] try to debug #18766 once more

Moved to #21125

mvollmer commented 2 weeks ago

The naughty landed -- so a retry of the amplified run should go green now, with the naughty hitting?

In principle yes, but the naughty pattern is too tight and doesn't trigger on a test named testHistory6...

mvollmer commented 2 weeks ago

Hmm, TestPages.testMenuSearch on ubuntu also flakes. Some service error show up later in the test although they have been cleared at the beginning. Let's fix that as well...

mvollmer commented 2 weeks ago

Hmm, TestPages.testMenuSearch on ubuntu also flakes. Some service error show up later in the test although they have been cleared at the beginning. Let's fix that as well...

It's fwupd-refresh.service that runs on a timer. It probly fails because it has no network connection.

Let's just reset the failure state immediately before the screenshot, that should make the time window impossible small to hit....

martinpitt commented 2 weeks ago

Thanks for tracking that down. That also seems fine to disable in test/vm.install.

mvollmer commented 2 weeks ago

That also seems fine to disable in test/vm.install.

Yeah, but then it's a different service that fails later on...

martinpitt commented 2 weeks ago

Let's just reset the failure state immediately

Ah, you meant a global systemctl reset-failed. Yes, that seems right.

mvollmer commented 2 weeks ago

Oh no, the naughty pattern doesn't match, probably because the text it matches against doesn't actually contain the # testHistory header... Without that header, the pattern is very unspecific, hmm.

mvollmer commented 2 weeks ago

Oh no, the naughty pattern doesn't match, probably because the text it matches against doesn't actually contain the # testHistory header... Without that header, the pattern is very unspecific, hmm.

Luckily @martinpitt figured it out. Extra whitespace, fixed.