genodelabs / genode

Genode OS Framework
https://genode.org/
Other
1.03k stars 249 forks source link

lxip: provide dummy `send_sig` #5161

Closed trimpim closed 1 month ago

trimpim commented 1 month ago

This is function gets called by some libssh applications using vms_lxip.

For the dummy implementation I looked at the old port.

chelmuth commented 1 month ago

I merged the commit but are still hesitant to close this issue. For example, send_sig() is used in sk_stream_error() to signal EPIPE errors. @ssumpf do you already have an opinion on that? May you look into this?

ssumpf commented 1 month ago

I merged the commit but are still hesitant to close this issue. For example, send_sig() is used in sk_stream_error() to signal EPIPE errors. @ssumpf do you already have an opinion on that? May you look into this?

@trimpim: Could you provide a backtrace in the case send_sig is called?

trimpim commented 1 month ago

@ssumpf sorry for the delay. Unfortunately I hadn't recorded the backtrace before. So I had to run the tests again.

BOARD=linux ; KERNEL=linux ; ARCH=x86_64

remote_access -> ssh_server]   0x1000000 .. 0x10ffffff: linker area
remote_access -> ssh_server]   0x40000000 .. 0x4fffffff: stack area
remote_access -> ssh_server]   0x50000000 .. 0x521b2fff: ld.lib.so
remote_access -> ssh_server]   0x10e17000 .. 0x10ffffff: libc.lib.so
remote_access -> ssh_server]   0x10d73000 .. 0x10e16fff: vfs.lib.so
remote_access -> ssh_server]   0x103f000 .. 0x10d7fff: libssh.lib.so
remote_access -> ssh_server]   0x10d8000 .. 0x165efff: libcrypto.lib.so
remote_access -> ssh_server]   0x165f000 .. 0x1675fff: zlib.lib.so
remote_access -> ssh_server]   0x10d62000 .. 0x10d72fff: vfs_lxip.lib.so
remote_access -> ssh_server]   0x1676000 .. 0x18a4fff: lxip.lib.so
...
remote_access -> ssh_server] _genode_errno:96 unsupported errno 104
remote_access -> ssh_server] Error: Function send_sig not implemented yet!
remote_access -> ssh_server] backtrace "ep"
remote_access -> ssh_server] Will sleep forever...

The message _genode_errno:96 unsupported errno 104 is printed multiple times. If I haven't overlooked anything, it starts appearing after ~15 SSH logouts and normally comes directly before the logout. I'm not sure if it is relevant, but it always is directly before Error: Function send_sig not implemented yet!

chelmuth commented 1 month ago

remote_access -> ssh_server] Error: Function send_sig not implemented yet! remote_access -> ssh_server] backtrace "ep" remote_access -> ssh_server] Will sleep forever...

There's actually no backtrace here. Please enable -fno-omit-frame-pointer and rebuild lib/vfs_lxip.

The message _genode_errno:96 unsupported errno 104 is printed multiple times. If I haven't overlooked anything, it starts appearing after ~15 SSH logouts and normally comes directly before the logout. I'm not sure if it is relevant, but it always is directly before Error: Function send_sig not implemented yet!

Errno 104 is ECONNRESET and indeed missing from _genode_errno. You may try to add it to the following files.

ssumpf commented 1 month ago

@trimpim: Additionally to @chelmuth comments, you can dump the resulting trace in the new backtrace found in the tool directory. For this to work:

cd <build-dir>/debug
<genode>/tool/backtrace ssh_server

paste:

remote_access -> ssh_server]   0x50000000 .. 0x521b2fff: ld.lib.so
remote_access -> ssh_server]   0x10e17000 .. 0x10ffffff: libc.lib.so
remote_access -> ssh_server]   0x10d73000 .. 0x10e16fff: vfs.lib.so
remote_access -> ssh_server]   0x103f000 .. 0x10d7fff: libssh.lib.so
remote_access -> ssh_server]   0x10d8000 .. 0x165efff: libcrypto.lib.so
remote_access -> ssh_server]   0x165f000 .. 0x1675fff: zlib.lib.so
remote_access -> ssh_server]   0x10d62000 .. 0x10d72fff: vfs_lxip.lib.so
remote_access -> ssh_server]   0x1676000 .. 0x18a4fff: lxip.lib.so

and than the actual backtrace into the terminal.

trimpim commented 1 month ago

There's actually no backtrace here. Please enable -fno-omit-frame-pointer and rebuild lib/vfs_lxip.

@chelmuth how do I do this again? If I remember correctly, I had to add an option to etc/tools.conf but I haven't used this in a long time and use a new computer since then.

ssumpf commented 1 month ago

There's actually no backtrace here. Please enable -fno-omit-frame-pointer and rebuild lib/vfs_lxip.

@chelmuth how do I do this again? If I remember correctly, I had to add an option to etc/tools.conf but I haven't used this in a long time and use a new computer since then.

@trimpim: You can add CC_OPT += -fno-omit-frame-pointer to your etc/tools.conf, but you have to make sure all the libs above are in your build command in the run script, not from the depot, otherwise the option will be ignored.

trimpim commented 1 month ago

@ssumpf thanks for the info. This makes me realize, that my run script, which produces the error, only uses depot archives, which are started/stopped using the depot_deploy mechanism.

If I can change some of the depot tooling to build depots with -fno-omit-frame-pointer, then I'm fine with recompiling the whole depot content for the test.

ssumpf commented 1 month ago

@trimpim: You can add it to the CC_OPT in base/mk/global.mk. This will enable the backtrace. You still need to build the ssh_server with the same options in your build directory to use the backtrace tool, though.

trimpim commented 1 month ago

@ssumpf here the processed output

void Genode::log<Genode::Backtrace>(Genode::Backtrace&&)

    * 0x170e6f3: lxip.lib.so:0xa16f3 W
    * /data/genode/repos/base/include/base/log.h:170

lx_emul_trace_and_stop

    * 0x170e7f0: lxip.lib.so:0xa17f0 T
    * /data/genode/repos/base/include/base/log.h:86

send_sig

    * 0x16ba47b: lxip.lib.so:0x4d47b T
    * ??:?

sk_stream_error

    * 0x177a71a: lxip.lib.so:0x10d71a T
    * /data/genode/contrib/linux-d8c12b28a8ba8bddc3b0d12c2e3cb369fdfd5c75/src/linux/net/core/stream.c:191

tcp_sendmsg_locked

    * 0x17c6437: lxip.lib.so:0x159437 T
    * /data/genode/contrib/linux-d8c12b28a8ba8bddc3b0d12c2e3cb369fdfd5c75/src/linux/include/net/tcp.h:1891

tcp_sendmsg

    * 0x17c6881: lxip.lib.so:0x159881 T
    * /data/genode/contrib/linux-d8c12b28a8ba8bddc3b0d12c2e3cb369fdfd5c75/src/linux/net/ipv4/tcp.c:1485

lx_socket_sendmsg

    * 0x1719265: lxip.lib.so:0xac265 T
    * /data/genode/repos/dde_linux/src/lib/lxip/lx_socket.c:412

Lx_sendmsg::execute()

    * 0x18042fc: lxip.lib.so:0x1972fc W
    * /data/genode/repos/dde_linux/src/lib/lxip/socket.cc:342

Lx_kit::Task::run()

    * 0x1718094: lxip.lib.so:0xab094 T
    * /data/genode/repos/base/include/base/log.h:193
ssumpf commented 1 month ago

@trimpim: Thanks for the backtrace. send_sig is called in sk_stream_error in case error is EPIPE and MSG_NOSIGNAL flag is not set (the error happens probably in sk_stream_wait_connect). This looks somewhat optional, and therefore, I would suggest to keep your commit and dummy implement the function.

As for ECONNRESET, I will add it to the IP-stack, even though neither EPIPE nor ECONNRESET are handled/propagated by the VFS plugin at the moment.

Does your SSH server scenario work with the dummy implementation or are there any other issues?

trimpim commented 1 month ago

@ssumpf Thanks fro the information and offering to add the error codes.

After adding the patch there is one test in an other component that fails, that did work with the old vfs_lxip and works with vfs_lwip. The test sends a really large packet (128Kib) to the azure management client. After receiving this message the azure management client can no longer send messages to the internet. I will try to capture the traffic and create an other issue for that if you are fine with that.

ssumpf commented 1 month ago

@trimpim: I will open an issue soon that will address lxip tests that currently still fail during nightly testing. You can put your issue there if you want.

trimpim commented 1 month ago

@ssumpf I'll do that.

ssumpf commented 1 month ago

@trimpim: https://github.com/genodelabs/genode/issues/5165

trimpim commented 1 month ago

@ssumpf thanks, cherry picked to our working branch.

nfeske commented 1 month ago

Merged to master.