cockpit-project / cockpit

Cockpit is a web-based graphical interface for servers.
http://www.cockpit-project.org/
GNU Lesser General Public License v2.1
11.23k stars 1.11k forks source link

fsinfo / new-style channels mess up Shell message routing #21219

Open martinpitt opened 8 hours ago

martinpitt commented 8 hours ago

Broken out from https://github.com/cockpit-project/cockpit/pull/21162

In that PR we try to add an fsinfo.ts fsinfo() call into the Terminal page, which completely breaks the existing bash spawn channel (and others). This page also uses a classic cockpit.channel(). Incidentally this is similar to #21059 where trying to port the sosreport page to fsinfo() also wreaks havoc -- that also uses a cockpit.channel().

When changing the URL path on the terminal page, this happens, which explains the disconnect:

send control: {"hash":"/?path=/home%2Fadmin%2Fa%2Fbb","hint":"location","command":"hint"}
send control: {"group":"cockpit1:localhost/system/terminal","command":"kill"}

This is done by pkg/shell/router.jsx's unregister():

console.trace() unregister 
Window https://127.0.0.2:9091/cockpit/@localhost/system/terminal.html#/?path=/home%2Fadmin%2Fa%2Fbb
 killing cockpit1:localhost/system/terminal [shell.js:43184:15](https://127.0.0.2:9091/cockpit/@localhost/shell/shell.js)
    unregister https://127.0.0.2:9091/cockpit/@localhost/shell/shell.js:43184
    message_handler https://127.0.0.2:9091/cockpit/@localhost/shell/shell.js:43256
    (Async: Async)
    Router https://127.0.0.2:9091/cockpit/@localhost/shell/shell.js:43318

This happens in response to an init message that the terminal apparently sends ... to itself!?

XXX message_handler message { target: Window, isTrusted: true, data: '\n{ "command": "init", "version": 1 }', origin: "https://127.0.0.2:9091", lastEventId: "", source: Window, ports: Restricted, srcElement: Window, currentTarget: Window, eventPhase: 2, … }
currentTarget: null
data: '\n{ "command": "init", "version": 1 }'
explicitOriginalTarget: Window https://127.0.0.2:9091/system/terminal#/?path=/home%2Fadmin%2Fa%2Fbb
origin: "https://127.0.0.2:9091"
originalTarget: Window https://127.0.0.2:9091/system/terminal#/?path=/home%2Fadmin%2Fa%2Fbb
source: Window https://127.0.0.2:9091/cockpit/@localhost/system/terminal.html#/?path=/home%2Fadmin%2Fa%2Fbb
srcElement: Window https://127.0.0.2:9091/system/terminal#/?path=/home%2Fadmin%2Fa%2Fbb
target: Window https://127.0.0.2:9091/system/terminal#/?path=/home%2Fadmin%2Fa%2Fbb
type: "message"

Debugging on the bridge side confirms that the kill() is really the first thing, i.e. it's not a weird fallout from fsinfo. And yet, when I replace the fsinfo call with a throw new Error("XXX fake failure") (or with a more complicated delayed sleep rejection) I don't get the kill.

Read ready on <cockpit.transports.StdioTransport object at 0x7f916bbc43d0> <cockpit.bridge.Bridge object at 0x7f916b4cf650> 0
  read 67 bytes
transport control received {'group': 'cockpit1:localhost/system/terminal', 'command': 'kill'}
do_kill(None, cockpit1:localhost/system/terminal).  Considering 8 endpoints.
sent 15 to process 3309
shutdown_endpoint(<cockpit.channels.dbus.DBusChannel object at 0x7f916b2d7ef0>, {}) will close {'1:3!1'}
sending control message {} {'command': 'close', 'channel': '1:3!1'}
writing to transport <cockpit.transports.StdioTransport object at 0x7f916bbc43d0>
router dropped channel 1:3!1
shutdown_endpoint(<cockpit.channels.stream.SubprocessStreamChannel object at 0x7f916b30f6e0>, {}) will close {'1:3!2'}
sending control message {'exit-status': None, 'message': ''} {'command': 'close', 'channel': '1:3!2'}
writing to transport <cockpit.transports.StdioTransport object at 0x7f916bbc43d0>
router dropped channel 1:3!2
Process exited with status 0
Read ready on <cockpit.transports.StdioTransport object at 0x7f916bbc43d0> <cockpit.bridge.Bridge object at 0x7f916b4cf650> 0
  read 200 bytes
channel control received {'payload': 'fsinfo', 'path': '/home/admin/a/b', 'attrs': ['type'], 'watch': False, 'host': 'localhost', 'command': 'open', 'channel': '1:4!1', 'flow-control': True, 'group': 'cockpit1:localhost/system/terminal'}

This is easier (less interactive) to investigate like this:

--- pkg/systemd/terminal.jsx
+++ pkg/systemd/terminal.jsx
@@ -42,6 +42,12 @@ const _ = cockpit.gettext;
             });
             ch.addEventListener("ready", (_, msg) => this.setState({ pid: msg.pid }), { once: true });
             ch.addEventListener("close", () => this.setState({ pid: null }), { once: true });
+
+            console.log("XXX createChannel calling fsinfo");
+            fsinfo("/foo", ['type'])
+                    .then(info => console.log("XXX createChannel fsinfo success", info))
+                    .catch(err => console.log("XXX createChannel fsinfo error", err));
+
             return ch;
         }

My first suspicion was colliding channel IDs (due to the occasional "with the same id as another channel" ws crash), but at first sight that doesn't seem to be the case.

One bug in the "new" transport is that the fsinfo channel immediately fails with "received message before init: ready", or sometimes with "ping". pkg/lib/cockpit/fsinfo.ts doesn't directly send "ping" or "ready", so whatever does that needs to wait for the transport to be ready.

It works reliably when opening the frame directly, without the shell: https://127.0.0.2:9091/cockpit/@localhost/system/terminal.html . This corroborates interference from the shell/routing.

I also replaced the fsinfo.ts fsinfo() call (which uses the "new-style" channels) with the classic:

const fsi = cockpit.channel({ payload: "fsinfo", path: "/home", attrs: ["type"], watch: false });
fsi.addEventListener('message', (_ev, payload) => console.log("XXX createChannel fsinfo message", payload));
fsi.addEventListener('close', (_ev, message) => console.log("XXX createChannel fsinfo close", message));

and that works fine -- so it's really not a bridge problem, but mixing old- and new-style Transport/Channel with our shell Router.

Originally posted by @martinpitt in https://github.com/cockpit-project/cockpit/issues/21162#issuecomment-2456566895

martinpitt commented 7 hours ago

I tried to force ensure_transport() early on, to avoid sending that init message in the middle of the page. With that change (in addition to the fsinfo call) I get a much stronger failure:

--- pkg/systemd/terminal.jsx
+++ pkg/systemd/terminal.jsx
@@ -7,17 +7,16 @@ import { createRoot } from "react-dom/client";
 import { FormSelect, FormSelectOption } from "@patternfly/react-core/dist/esm/components/FormSelect/index.js";
 import { NumberInput } from "@patternfly/react-core/dist/esm/components/NumberInput/index.js";
 import { Toolbar, ToolbarContent, ToolbarGroup, ToolbarItem } from "@patternfly/react-core/dist/esm/components/Toolbar/index.js
";
-import { Alert, AlertActionCloseButton, AlertActionLink } from "@patternfly/react-core/dist/esm/components/Alert/index.js";
+import { ensure_transport } from 'cockpit/_internal/transport';
 import { fsinfo } from "cockpit/fsinfo";

-
 import "./terminal.scss";

 import { Terminal } from "cockpit-components-terminal.jsx";

 const _ = cockpit.gettext;

-(function() {
+ensure_transport(() => {
     cockpit.translate();

     /*
@@ -293,4 +299,4 @@ const _ = cockpit.gettext;

     /* And show the body */
     document.body.removeAttribute("hidden");
-}());
+});

That immediately kills the session with cockpit-ws saying

cockpit-ws[1278]: cannot open a channel 4:2!1 with the same id as another channel

I originally tried this with wrapping the terminal.jsx page into cockpit.transport.wait(), but that doesn't suffice.

martinpitt commented 6 hours ago

So indeed the terminal.jsx now sends two init messages. The first one through I don't know where from (most probably not from peer.py, but I see nothing else), and the second one from pkg/lib/cockpit/_internal/transport.ts #ws.onopen(), where I added an extra NN: 1 field:

XXX shell router: got init 
Object { command: "init", version: 1 }
 for child 
Window https://127.0.0.2:9091/cockpit/@localhost/system/terminal.html#/
 not having source yet [shell.js:43258:21](https://127.0.0.2:9091/cockpit/@localhost/shell/shell.js)
XXX shell router: got init 
Object { command: "init", NN: 1, version: 1 }
 for child 
Window https://127.0.0.2:9091/cockpit/@localhost/system/terminal.html#/
 unregistering source 
Object { name: "cockpit1:localhost/system/terminal", window: Window, channel_seed: "1:1!", default_host: "localhost", page: "system/terminal", inited: true }

So it feels like mixing old-style and new-style Channels really doesn't work. But I thought even the "old-style" channels from cockpit.js used cockpit/_internal/transport.ts..

martinpitt commented 6 hours ago

AAAAARGH! I get it now. It's because of pkg/lib/cockpit/_internal/transport.ts TransportGlobals, which are not really a singleton when "linking" it both into the bundle (as it happens with new-style channels) but also <script>ing cockpit.js from pages -- which we do everywhere in cockpit except kdump, and we also bundle it in cockpit-files. That's why the latter works just fine.

martinpitt commented 6 hours ago

Fixed in #21059