Xpra-org / xpra-html5

HTML5 client for Xpra
Mozilla Public License 2.0
218 stars 57 forks source link

protocol error: invalid packet header format: 100: 0x646973636f6e6e65 #328

Open MoritzWeber0 opened 3 weeks ago

MoritzWeber0 commented 3 weeks ago

We've recently updated our Xpra HTML5 client from version 10.1 to 16.2. Unfortunately, the stability has decreased massively. During the initial connection, it reloads several times, then loads and after ~10 seconds of of a stable connection, it drops the connection. We didn't see those issues with version 10.1. The Xpra version itself didn't change and is v6.2.1-r0.

The connection always drops with the following error message (protocol error: invalid packet header format: 100: 0x646973636f6e6e65):
image

It's always exactly the same message.

The logs of Xpra don't contain any valuable information, but here is the more interesting console.log.

We run Xpra behind an nginx reverse proxy, but the configuration is rather simple:

# SPDX-FileCopyrightText: Copyright DB InfraGO AG and contributors
# SPDX-License-Identifier: Apache-2.0

pid /tmp/nginx.pid;
daemon off;
events{}
http {
    # These options are needed to run as non-root
    client_body_temp_path /tmp/client_temp;
    proxy_temp_path       /tmp/proxy_temp_path;
    fastcgi_temp_path     /tmp/fastcgi_temp;
    uwsgi_temp_path       /tmp/uwsgi_temp;
    scgi_temp_path        /tmp/scgi_temp;

    server {
        listen 10000;
        server_name _;

        root /usr/share/nginx/html;
        error_page 502 /error.html;
        error_page 404 /error.html;

        location __XPRA_SUBPATH__ {
            rewrite ^__XPRA_SUBPATH__(.*) /$1 break;

            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_set_header Host $host;

            proxy_pass http://127.0.0.1:10001;
            proxy_buffering off;

            proxy_hide_header Content-Security-Policy;
            add_header Content-Security-Policy "frame-ancestors self __XPRA_CSP_ORIGIN_HOST__";
        }
    }
}
totaam commented 3 weeks ago

Can you reproduce these issues without your reverse proxy?

MoritzWeber0 commented 3 weeks ago

I spent a bit of time investigating it in more detail. The latest HTML5 client seems to be a lot less resilient to slow internet connections than the 10.1 version. It tries to reconnect quickly while the old version tried to keep the connection alive.

It's reproducible with a local running Xpra (without any reverse proxy in the way) when throttling is enabled in the Chrome Dev-Tools. When I change it to 3G and reload the page, it tries to reconnect a few times (connection loop) and sometimes I could also see the above error message. Also, it quickly drops the connection.

totaam commented 3 weeks ago

What does the JavaScript console show when the reconnect triggers?

MoritzWeber0 commented 3 weeks ago

The full console.log is attached to the ticket, but the relevant messages during dropped connection are:

Utilities.js:1 audio-state: stopped
Utilities.js:1 cancel_all_files( closing ) will cancel: []
Utilities.js:1 connection-lost
Utilities.js:1 connection_progress( Connecting to server ,  example.com:443/session/dfgjeumbgeknvzvbhdtavqlps/ with ssl ,  40 )
MoritzWeber0 commented 3 weeks ago

I just see that the attached console.log has reconnect disabled. Here is another log from my local environment:

The "initial connection loop" has the following trace in the console.log: localhost-1730799414561.log

[...]
 connection-established
 websocket closed:  invalid packet header format: 100: 0x646973636f6e6e65 reason:  null , reconnect:  true , reconnect attempt:  0
 connection-lost
 websocket closed:  'Normal Closure' (1000): 'unknown reason' reason:  invalid packet header format: 100: 0x646973636f6e6e65 , reconnect:  true , reconnect attempt:  1
 audio-state: stopped
 cancel_all_files( closing ) will cancel: []
 connection-lost
 connection_progress( Connecting to server ,  localhost:8888/xpra/ ,  40 )
 offscreen canvas requires https with Chrome
 we have webworker support
 using decode worker
 decode worker will check: (11) ['jpeg', 'png', 'png/P', 'png/L', 'rgb', 'rgb32', 'rgb24', 'scroll', 'webp', 'void', 'avif']
 audio-state: stopped
 cancel_all_files( closing ) will cancel: []
 connection-lost
 connection_progress( Connecting to server ,  localhost:8888/xpra/ ,  40 )
 offscreen canvas requires https with Chrome
 we have webworker support
 using decode worker
 decode worker will check: (11) ['jpeg', 'png', 'png/P', 'png/L', 'rgb', 'rgb32', 'rgb24', 'scroll', 'webp', 'void', 'avif']
 websocket closed:  invalid packet header format: 100: 0x646973636f6e6e65 reason:  null , reconnect:  true , reconnect attempt:  2
 connection-lost
 we can use websocket in webworker
 connection_progress( Opening WebSocket connection ,  ws://localhost:8888/xpra/ ,  50 )
[...]
MoritzWeber0 commented 3 weeks ago

If you can't reproduce it locally, I can offer a Docker image that runs Xpra with an instance of Eclipse Capella (that's what we use in production). It's relatively large though (3.7GB).

docker run -p 10000:10000 -e CONNECTION_METHOD=xpra ghcr.io/dsd-dbs/capella-dockerimages/capella/remote:7.0.0-without-dropins-v2.7.0

After it's pulled and started, connect to: http://localhost:10000/ Then go to the dev tools in the browser, open the Network tab, add 3G as network throttling, and refresh the page. After a bit of waiting time, it will go into a reconnect loop.

MoritzWeber0 commented 3 weeks ago

Workaround is to set client.OPEN_TIMEOUT = 1000000. Seems to work fine then.

EDIT: Only for the initial reconnection loop, not for the connection losses. There I still get the same protocol error in the console log.

MoritzWeber0 commented 3 weeks ago

To be a bit more specific, the error occurs with version v16, but not with v15.1. It must have to do with changes between those releases: https://github.com/Xpra-org/xpra-html5/compare/v15.1...v16