Closed jhgoebbert closed 9 months ago
The real case scenario is that our login nodes are crowded with people and xpra has slow access to IO or compute. Instead of waiting for the file requested by the browser it seems to me that xpra gives up and sends an index.html with size 0byte.
I will try to prove that with the logs.
These are the Xpra logs when the browser receives a 0byte index.html:
^[[32m2023-06-08 14:37:04,175 flush_then_close: wait_for_packet_sent() close_and_release()^[[0m
^[[32m2023-06-08 14:37:04,175 Protocol.close(None) closed=True, connection=None^[[0m
^[[32m2023-06-08 14:37:04,175 flush_then_close: done, callback=None^[[0m
^[[32m2023-06-08 14:37:04,194 untilConcludes(<bound method Connection.is_active of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900>, <bound method Connection.can_retry of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900>, <built-in method recv of socket object at 0x7f07e7fe0400>, (65536,), {}) timed out, retry=socket.timeout^[[0m
^[[32m2023-06-08 14:37:04,247 io_in_cb(<socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 12122)>, <flags G_IO_IN of type GLib.IOCondition>)^[[0m
^[[32m2023-06-08 14:37:04,247 read_parse_thread_loop starting^[[0m
^[[32m2023-06-08 14:37:04,248 _new_connection(<socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 12122)>, 'tcp', 0)^[[0m
^[[32m2023-06-08 14:37:04,248 invalid header: b'GET /index.html HTTP/1.1\r\nHost: localhost:12 .. ed-Since: Wed, 07 Jun 2023 18:55:42 GMT\r\n\r\n', input_packetcount=0, html=True, ssl=True^[[0m
^[[32m2023-06-08 14:37:04,248 invalid_header(Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900), 817 bytes: 'invalid packet header byte', b'GET /index.html HTTP/1.1\r\nHost: localhost:12 .. ed-Since: Wed, 07 Jun 2023 18:55:42 GMT\r\n\r\n')^[[0m
^[[32m2023-06-08 14:37:04,249 SocketConnection(<socket.socket fd=20, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 12122), raddr=('127.0.0.1', 50914)>, ('127.0.0.1', 12122), ('127.0.0.1', 50914), ('127.0.0.1', 50914), 'tcp', None, {})^[[0m
^[[32m2023-06-08 14:37:04,249 Connection(('127.0.0.1', 50914), 'tcp', None, {})^[[0m
^[[32m2023-06-08 14:37:04,249 tcp options: cork=True, nodelay=None^[[0m
^[[32m2023-06-08 14:37:04,249 accept_connection(<socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 12122)>, tcp, 0.1)=tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914^[[0m
^[[32m2023-06-08 14:37:04,250 process default packet gibberish^[[0m
^[[32m2023-06-08 14:37:04,250 Received uninterpretable nonsense from Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900): invalid packet header byte: http^[[0m
^[[32m2023-06-08 14:37:04,250 data: b'GET /index.html HTTP/1.1\r\nHost: localhost:12 .. ed-Since: Wed, 07 Jun 2023 18:55:42 GMT\r\n\r\n'^[[0m
^[[32m2023-06-08 14:37:04,250 disconnect_client(Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900), invalid packet header byte: http, ())^[[0m
^[[32m2023-06-08 14:37:04,250 disconnect_protocol(Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900), ('invalid packet header byte: http',))^[[0m
^[[32m2023-06-08 14:37:04,250 do_get_socket_info(<socket.socket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 12122), raddr=('127.0.0.1', 50900)>) fd=19, local=('127.0.0.1', 12122)^[[0m
^[[32m2023-06-08 14:37:04,251 getsockopt(SOL_TCP, TCP_INFO, 232)={'state': 1, 'ca_state': 0, 'retransmits': 0, 'probes': 0, 'backoff': 0, 'options': 7, 'snd_wscale': 9, 'rcv_wscale': 9, 'rto': 212000, 'ato': 40000, 'snd_mss': 22016, 'rcv_mss': 817, 'unacked': 0, 'sacked': 0, 'lost': 0, 'retrans': 0, 'fackets': 0, 'last_data_sent': 5135, 'last_ack_sent': 0, 'last_data_recv': 5, 'last_ack_recv': 5, 'pmtu': 65535, 'rcv_ssthresh': 45324, 'rtt': 62, 'rttvar': 31, 'snd_ssthresh': 2147483647, 'snd_cwnd': 10, 'advmss': 65483, 'reordering': 3, 'rcv_rtt': 5130000, 'rcv_space': 43690, 'total_retrans': 0, 'pacing_rate': 18446744073709551615, 'max_pacing_rate': 18446744073709551615, 'bytes_acked': 0, 'bytes_received': 817, 'segs_out': 1, 'segs_in': 3, 'notsent_bytes': 0, 'min_rtt': 62, 'data_segs_in': 1, 'data_segs_out': 0, 'delivery_rate': 0, 'busy_time': 0, 'rwnd_limited': 0, 'sndbuf_limited': 0, 'delivered': 1, 'delivered_ce': 0, 'bytes_sent': 0, 'bytes_retrans': 0, 'dsack_dups': 0, 'reord_seen': 0, 'rcv_ooopack': 0, 'snd_wnd': 44032}^[[0m
^[[32m2023-06-08 14:37:04,251 handle_new_connection(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914, ('0.0.0.0', 12122), {}) sockname=('127.0.0.1', 12122), target=('127.0.0.1', 50914)^[[0m
^[[32m2023-06-08 14:37:04,252 get_tcp_info(<socket.socket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 12122), raddr=('127.0.0.1', 50900)>)={'state': 1, 'ca_state': 0, 'retransmits': 0, 'probes': 0, 'backoff': 0, 'options': 7, 'snd_wscale': 9, 'rcv_wscale': 9, 'rto': 212000, 'ato': 40000, 'snd_mss': 22016, 'rcv_mss': 817, 'unacked': 0, 'sacked': 0, 'lost': 0, 'retrans': 0, 'fackets': 0, 'last_data_sent': 5135, 'last_ack_sent': 0, 'last_data_recv': 5, 'last_ack_recv': 5, 'pmtu': 65535, 'rcv_ssthresh': 45324, 'rtt': 62, 'rttvar': 31, 'snd_ssthresh': 2147483647, 'snd_cwnd': 10, 'advmss': 65483, 'reordering': 3, 'rcv_rtt': 5130000, 'rcv_space': 43690, 'total_retrans': 0, 'pacing_rate': 18446744073709551615, 'max_pacing_rate': 18446744073709551615, 'bytes_acked': 0, 'bytes_received': 817, 'segs_out': 1, 'segs_in': 3, 'notsent_bytes': 0, 'min_rtt': 62, 'data_segs_in': 1, 'data_segs_out': 0, 'delivery_rate': 0, 'busy_time': 0, 'rwnd_limited': 0, 'sndbuf_limited': 0, 'delivered': 1, 'delivered_ce': 0, 'bytes_sent': 0, 'bytes_retrans': 0, 'dsack_dups': 0, 'reord_seen': 0, 'rcv_ooopack': 0, 'snd_wnd': 44032, 'sndbuf_size': 2626560, 'sndbuf_unacked': 0, 'sndbuf_bytes': 0}^[[0m
^[[32m2023-06-08 14:37:04,252 peek_connection(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914, 1000, 8192)^[[0m
2023-06-08 14:37:04,252 Disconnecting client 127.0.0.1:50900:^[[0m
^[[32m2023-06-08 14:37:04,252 set_socket_timeout(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914, 1.0) applied to <socket.socket fd=20, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 12122), raddr=('127.0.0.1', 50914)>^[[0m
2023-06-08 14:37:04,252 invalid packet header byte: http^[[0m
^[[32m2023-06-08 14:37:04,252 <built-in method recv of socket object at 0x7f07e7ff09a0>(8192, MSG_PEEK)^[[0m
^[[32m2023-06-08 14:37:04,253 flush_then_close(['disconnect', 'invalid packet header byte: http'], None) closed=False^[[0m
^[[32m2023-06-08 14:37:04,253 flush_then_close: wait_for_write_lock()^[[0m
^[[32m2023-06-08 14:37:04,253 flush_then_close: acquired the write lock^[[0m
^[[32m2023-06-08 14:37:04,253 flush_then_close: queue is now empty, sending the last packet and closing^[[0m
^[[32m2023-06-08 14:37:04,253 packet type send alias not found for 'disconnect'^[[0m
^[[32m2023-06-08 14:37:04,253 sending 45 bytes without header^[[0m
^[[32m2023-06-08 14:37:04,253 raw_write for disconnect, starting write thread^[[0m
^[[32m2023-06-08 14:37:04,253 io_thread_loop(write, <bound method Protocol._write of Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900)>) loop starting^[[0m
^[[32m2023-06-08 14:37:04,253 cleanup_protocol(Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900))^[[0m
^[[32m2023-06-08 14:37:04,254 changed tcp socket to nodelay=True^[[0m
^[[32m2023-06-08 14:37:04,254 flush_then_close: packet_queued() closed=False^[[0m
^[[32m2023-06-08 14:37:04,254 flush_then_close: wait_for_packet_sent() queue.empty()=True, closed=False^[[0m
^[[32m2023-06-08 14:37:04,254 flush_then_close: wait_for_packet_sent() close_and_release()^[[0m
^[[32m2023-06-08 14:37:04,254 Protocol.close(None) closed=False, connection=tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900^[[0m
^[[32m2023-06-08 14:37:04,254 Protocol.close(None) calling <bound method SocketConnection.close of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900>^[[0m
^[[32m2023-06-08 14:37:04,254 process default packet connection-lost^[[0m
^[[32m2023-06-08 14:37:04,254 tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900.close() socket=<socket.socket fd=19, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 12122), raddr=('127.0.0.1', 50900)>^[[0m
^[[32m2023-06-08 14:37:04,254 process_connection_lost(Protocol(None), ['connection-lost'])^[[0m
^[[32m2023-06-08 14:37:04,255 tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50900.close() done^[[0m
^[[32m2023-06-08 14:37:04,255 cleanup_protocol(Protocol(None))^[[0m
^[[32m2023-06-08 14:37:04,255 terminate_queue_threads()^[[0m
^[[32m2023-06-08 14:37:04,255 Protocol.close(None) done^[[0m
^[[32m2023-06-08 14:37:04,255 flush_then_close: done, callback=None^[[0m
^[[32m2023-06-08 14:37:04,255 io_thread_loop(write, <bound method Protocol._write of Protocol(None)>) loop ended, closed=True^[[0m
^[[32m2023-06-08 14:37:04,347 can_retry: <class 'OSError'>, args=(9, 'Bad file descriptor'), errno=9, code=9, abort=9^[[0m
^[[32m2023-06-08 14:37:04,347 None closed in read loop
Traceback (most recent call last):
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/protocol.py", line 690, in _io_thread_loop
while not self._closed and callback():
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/protocol.py", line 779, in _read
buf = self.con_read()
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/protocol.py", line 797, in con_read
return self._conn.read(self.read_buffer_size)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 345, in read
return self._read(self._socket.recv, n)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 162, in _read
r = self.untilConcludes(*args)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 147, in untilConcludes
return untilConcludes(self.is_active, self.can_retry, *args)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 82, in untilConcludes
retry = can_retry_cb(e)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 144, in can_retry
return can_retry(e)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 72, in can_retry
raise ConnectionClosedException(e) from None
xpra.net.common.ConnectionClosedException: [Errno 9] Bad file descriptor^[[0m
^[[32m2023-06-08 14:37:05,251 connection lost: invalid packet header byte: http^[[0m
^[[32m2023-06-08 14:37:05,251 Protocol.close(invalid packet header byte: http) closed=True, connection=None^[[0m
^[[32m2023-06-08 14:37:05,254 peek_connection(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914, 1000) failed
Traceback (most recent call last):
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/socket_util.py", line 196, in peek_connection
peek_data = conn.peek(size)
File "/p/usersoftware/swmanage/goebbert1/stage2023/JupyterLab/easybuild/juwels/software/xpra/4.4.5-GCCcore-11.3.0/lib/python3.10/site-packages/xpra-4.4.5-py3.10-linux-x86_64.egg/xpra/net/bytestreams.py", line 342, in peek
return self._socket.recv(n, socket.MSG_PEEK)
TimeoutError: timed out^[[0m
^[[32m2023-06-08 14:37:05,505 peek: elapsed=1252, timeout=1000^[[0m
^[[32m2023-06-08 14:37:05,505 socket tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914 peek: got 0 bytes^[[0m
^[[32m2023-06-08 14:37:05,505 socket peek=b''^[[0m
^[[32m2023-06-08 14:37:05,506 socket peek hex=^[[0m
^[[32m2023-06-08 14:37:05,506 socket peek line1=b''^[[0m
^[[32m2023-06-08 14:37:05,506 guess_packet_type(..)=None^[[0m
^[[32m2023-06-08 14:37:05,506 log_new_connection(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914, ('0.0.0.0', 12122)) type=<class 'xpra.net.bytestreams.SocketConnection'>, sock=<socket.socket fd=20, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 12122), raddr=('127.0.0.1', 50914)>, sockname=('127.0.0.1', 12122), address=('127.0.0.1', 50914), peername=('127.0.0.1', 50914)^[[0m
2023-06-08 14:37:05,506 New tcp connection received^[[0m
2023-06-08 14:37:05,506 from '127.0.0.1:50914'^[[0m
2023-06-08 14:37:05,506 on '0.0.0.0:12122'^[[0m
^[[32m2023-06-08 14:37:05,506 make_protocol('tcp', tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914, {}, <function ServerCore.make_protocol.<locals>.xpra_protocol_class at 0x7f07e7f8f7f0>, None)^[[0m
^[[32m2023-06-08 14:37:05,507 tcp: encryption=, keyfile=^[[0m
^[[32m2023-06-08 14:37:05,507 socktype=tcp, encryption=, keyfile=^[[0m
^[[32m2023-06-08 14:37:05,507 io_thread_loop(read, <bound method Protocol._read of Protocol(tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914)>) loop starting^[[0m
^[[32m2023-06-08 14:37:05,608 untilConcludes(<bound method Connection.is_active of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914>, <bound method Connection.can_retry of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914>, <built-in method recv of socket object at 0x7f07e7ff09a0>, (65536,), {}) timed out, retry=socket.timeout^[[0m
^[[32m2023-06-08 14:37:05,708 untilConcludes(<bound method Connection.is_active of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914>, <bound method Connection.can_retry of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914>, <built-in method recv of socket object at 0x7f07e7ff09a0>, (65536,), {}) timed out, retry=socket.timeout^[[0m
^[[32m2023-06-08 14:37:05,809 untilConcludes(<bound method Connection.is_active of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914>, <bound method Connection.can_retry of tcp socket: 127.0.0.1:12122 <- 127.0.0.1:50914>, <built-in method recv of socket object at 0x7f07e7ff09a0>, (65536,), {}) timed out, retry=socket.timeout^[[0m
and ssh-tunneling the port to my local laptop.
That's the problem, ssh adds buffering.
Don't do that, or use bind-ws
/ bind-wss
instead of bind-tcp
and bind-ssl
.
Another solution is to not use xpra as a web server and host the html5 client on a real web server.
I understand your point 🤔
... and so I tried with bind-wss
(through an nginx proxy). It was much less likely that xpra provided 0-sized js/html files to the browser, but it still happened:
If you have an nginx proxy, then let nginx serve the html files.
Sure, that would be a good work-around - but unfortunately not a good one for our setup :thinking:
We are serving different web-applications running on our HPC systems though a JupyterHub/Jupyter-Server-Proxy combination (hosted on our cloud) for authentication/authorization/HPCforwarding. Xpra runs on the HPC system (compute or login) but the nginx on the cloud.
I have not seen these issues yet with other webapps (like JupyterLab - based on tornado). So I was wondering what is done different there.
This can definitely be fixed, but I just don't have the time for it. Sorry.
Thank you for your answer. I understand the problem now much better and I can image that a fix can be complex/time-consuming.
And of course: Thank you also for the great Xpra! It replaced VNC at our place almost completely.
@jhgoebbert Thanks. As a corporate user, have you considered: https://github.com/Xpra-org/xpra/blob/master/docs/SPONSORS.md
Good you point that out. I will pass this on.
I can reproduce the problem without shortening the server timeout using:
from time import sleep
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(("127.0.0.1", 10000))
sleep(6)
s.sendall(b"GET /echo HTTP/1.1\n\n")
data = s.recv(1024)
print(f"Received {data!r}")
Which prints:
Received b'disconnect: invalid packet header byte: http\n'
On the server side:
accept_connection(<socket.socket fd=6, family=2, type=1, proto=0, laddr=('0.0.0.0', 10000)>, tcp, 0.1)=tcp socket: 127.0.0.1:10000 <- 127.0.0.1:49526
(...)
socket peek line1=b''
(...)
New tcp connection received
(...)
untilConcludes(<bound method Connection.is_active of tcp socket: 127.0.0.1:10000 <- 127.0.0.1:49526>, <bound method Connection.can_retry of tcp socket: 127.0.0.1:10000 <- 127.0.0.1:49526>, <built-in method recv of socket object at 0x7f18a85d86e0>, (65536,), {}) timed out, retry=socket.timeout
(...)
read_parse_thread_loop starting
invalid header: b'GET /echo HTTP/1.1\n\n', input_packetcount=0, websocket_upgrade=True, ssl=True
invalid_header(Protocol(tcp socket: 127.0.0.1:10000 <- 127.0.0.1:49526), 20 bytes: 'invalid packet header byte', b'GET /echo HTTP/1.1\n\n')
process default packet gibberish
I'm not sure if this is the correct approach yet, for sure the code needs tidying up, but the commit above does solve the immediate problem and looks back-portable to v5.
TODO:
SocketConnection
objects "peekable" since there is no overhead until enable_peek
is calledinvalid_header
can't send a message back once we've called steal_connection
..@jhgoebbert please try the latest beta builds, 6.0-r34912 and later have this fix.
Describe the bug Sometimes Xpra does not deliver the requested HTML files (index.html) or delivers corrupted ones.
To Reproduce
xpra start --html=on --bind-tcp=0.0.0.0:12122 --start=xterm --sharing -d all
and ssh-tunneling the port to my local laptop.Then I can see ...
localhost sent an invalid response. ERR_INVALID_HTTP_RESPONSE
(image 2 below) and after a certain time it works again (see image 3)System Information (please complete the following information):
Additional context