cockpit-project / cockpit

Cockpit is a web-based graphical interface for servers.
http://www.cockpit-project.org/
GNU Lesser General Public License v2.1
11.36k stars 1.13k forks source link

text channels drop frames with split UTF-8 multi-byte characters #19235

Closed martinpitt closed 5 months ago

martinpitt commented 1 year ago

Our tests occasionally fail with unexpected messages

invalid non-UTF8 @data passed as text to web_socket_connection_send()

Today it happened here in TestJournal.testAbrtReport. There is no other information about that, so we need to collect some instances to identify some patterns.

It may also help to actually show the data, so that we get a clue what happens.

:arrow_up: By today (June 2024) the above logs are all gone, but https://github.com/cockpit-project/cockpit-podman/issues/1733 is a recent case with recent logs which is very easy to reproduce -- but it won't be any more in a few days. See below, @martinpitt captured this bug in a unit test.

martinpitt commented 1 year ago

I ran

for i in `seq 30`; do test/verify/check-system-journal TestJournal.testAbrtReport TestJournal.testAbrtDelete TestJournal.testAbrtReportCancel $RUNC || break; done

several times, and while I did find some other flakes, I never hit this issue.

martinpitt commented 1 year ago

lis ║ it happens when a non-"binary=True" channel sends invalid utf-8, which should never happen, but we actually never ever check

mvollmer commented 1 year ago

I just got this on Arch:

/updates/updates.js: couldn't parse http-stream1 header payload: JSON data must be UTF-8 encoded
cockpit_web_response_queue: assertion 'self->complete == FALSE' failed
cockpit_web_response_queue: assertion 'self->complete == FALSE' failed

Maybe related?

https://cockpit-logs.us-east-1.linodeobjects.com/pull-5144-20230824-094601-66eef2ca-arch-other-cockpit-project-cockpit/log.html#141

martinpitt commented 1 year ago

I think it's unrelated, but don't take my word for it.

mvollmer commented 1 year ago

And again:

/system/services.css: couldn't parse http-stream1 header payload: JSON data must be UTF-8 encoded
cockpit_web_response_queue: assertion 'self->complete == FALSE' failed
cockpit_web_response_queue: assertion 'self->complete == FALSE' failed

https://cockpit-logs.us-east-1.linodeobjects.com/pull-19079-20230824-105223-f05215f3-arch-storage-bots-5147/log.html

This is with a new Arch image, so it might indeed be caused by that.

martinpitt commented 1 year ago

Interesting, I have an instance in testHiddenRaid, so far all the others were in the ABRT tests. This happens often enough that I'll send a PR to ignore that message. We have enough instances now, no need to constantly break our CI with this.

martinpitt commented 6 months ago

Interestingly this very issue came back and currently hits us hard here: https://github.com/cockpit-project/cockpit-podman/issues/1733

It happens if some data source sends UTF-8, and the frame gets split in the middle of a multi-byte character. I have a unit test for this now in my utf8-split branch. As pushed, it fails with

(test-server:231201): WebSocket-CRITICAL **: 07:39:15.715: invalid non-UTF8 @data passed as text to web_socket_connection_send()
cleaning up pid 231203
Trace/breakpoint trap (core dumped)

and the unit test just aborts as the web server goes away. When toning this down with

--- src/websocket/websocketconnection.c
+++ src/websocket/websocketconnection.c
@@ -1740,7 +1740,7 @@ web_socket_connection_send (WebSocketConnection *self,
       if (!g_utf8_validate (pref, prefix_len, NULL) ||
           !g_utf8_validate (payload, payload_len, NULL))
         {
-          g_critical ("invalid non-UTF8 @data passed as text to web_socket_connection_send()");
+          g_message ("invalid non-UTF8 @data passed as text to web_socket_connection_send()");
           return;
         }
       break;

then pytest -k http fails with

# test: split UTF8 frames {}
not ok 1 - correct response, expected: "initialfirst half é second halffinal", got: "initialfinal", test: split UTF8 frames, module: , source: at Function.<anonymous> (http://localhost:46513/qunit/base1/test-http.js:9517:85)

which is exactly the problem that we see in https://github.com/cockpit-project/cockpit-podman/issues/1733.

I'm not entirely sure what do do here -- it's tempting to just ditch WEB_SOCKET_DATA_TEXT and send everything as bytes, and don't care about UTF-8 in data streams, bridge, and webserver, until the very end in cockpit.js. But I suppose there were reasons to do it that way, so we need to fix the bridge to do some buffering and partial decoding of such frames.

@allisonkarlitskaya you seemed to have some idea/opinion about this?

martinpitt commented 5 months ago

I sent PR# 20628 to try and address this.