ansible / pylibssh

Python bindings specific to Ansible use case for libssh https://www.libssh.org/
https://ansible-pylibssh.rtfd.io
GNU Lesser General Public License v2.1
59 stars 30 forks source link

Intermittent SIGSEGV when running multiple ssh_channel.exec_command() #645

Open kucharskim opened 2 months ago

kucharskim commented 2 months ago
SUMMARY

On OpenBSD -current as of 2024-09-04 I have following backtrace from a core dump:

Core was generated by `python3.11'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000008cf814426a8 in __pyx_type_11pylibsshext_7channel_Channel ()
   from /usr/local/lib/python3.11/site-packages/pylibsshext/channel.cpython-311.so
(gdb) bt
#0  0x000008cf814426a8 in __pyx_type_11pylibsshext_7channel_Channel ()
   from /usr/local/lib/python3.11/site-packages/pylibsshext/channel.cpython-311.so
#1  0x000008cf979ac6c4 in ssh_packet_socket_controlflow_callback () from /usr/local/lib/libssh.so.4.2
#2  0x000008cf979b842b in ssh_socket_pollcallback () from /usr/local/lib/libssh.so.4.2
#3  0x000008cf979b4365 in ssh_poll_ctx_dopoll () from /usr/local/lib/libssh.so.4.2
#4  0x000008cf979b5bb6 in ssh_handle_packets () from /usr/local/lib/libssh.so.4.2
#5  0x000008cf979b595b in ssh_handle_packets_termination () from /usr/local/lib/libssh.so.4.2
#6  0x000008cf97993195 in channel_open () from /usr/local/lib/libssh.so.4.2
#7  0x000008cf8143a3ac in __pyx_pf_11pylibsshext_7channel_7Channel_24exec_command (__pyx_v_self=0x8cf82114ac0, 
    __pyx_v_command=<error reading variable: Cannot access memory at address 0x0>) at pylibsshext/channel.c:7221
#8  __pyx_pw_11pylibsshext_7channel_7Channel_25exec_command (
    __pyx_v_self=<pylibsshext.channel.Channel at remote 0x8cf82114ac0>, __pyx_args=<optimized out>, 
    __pyx_nargs=<optimized out>, __pyx_kwds=<optimized out>) at pylibsshext/channel.c:7138
#9  0x000008cedc9a264d in PyObject_Vectorcall () from /usr/local/lib/libpython3.11.so.0.0
#10 0x000008cedcabb1cd in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.11.so.0.0
#11 0x000008cedcaaab1e in PyEval_EvalCode () from /usr/local/lib/libpython3.11.so.0.0
#12 0x000008cedcb1b6e9 in run_mod () from /usr/local/lib/libpython3.11.so.0.0
#13 0x000008cedcb1b1d5 in _PyRun_SimpleFileObject () from /usr/local/lib/libpython3.11.so.0.0
#14 0x000008cedcb1a3b3 in _PyRun_AnyFileObject () from /usr/local/lib/libpython3.11.so.0.0
#15 0x000008cedcb44172 in Py_RunMain () from /usr/local/lib/libpython3.11.so.0.0
#16 0x000008cedcb45095 in pymain_main () from /usr/local/lib/libpython3.11.so.0.0
#17 0x000008cedcb454bc in Py_BytesMain () from /usr/local/lib/libpython3.11.so.0.0
#18 0x000008ccc9b7c94b in _start ()
ISSUE TYPE
PYLISSH and LIBSSH VERSION
$ python3 version.py
__full_version__='<pylibsshext v1.2.2 with libssh v0.10.6>'
__libssh_version__='0.10.6'
__version__='1.2.2'
__version_info__=(1, 2, 2)
OS / ENVIRONMENT
OpenBSD 7.6-beta (GENERIC.MP) #310: Wed Sep  4 11:59:45 MDT 2024
    deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
# pkg_info -qI python
python-3.11.10p0

# pkg_info -qI libssh
libssh-0.10.6

# pkg_info -qI py3-ansible-libssh
py3-ansible-libssh-1.2.2
STEPS TO REPRODUCE
$ cat test1.py
#!/usr/bin/env python3

from pylibsshext.errors import LibsshSessionException
from pylibsshext.session import Session

import logging

HOST = "examplemachine1"
USER = "root"
TIMEOUT = 30
PORT = 22

ssh = Session()

# ssh.set_log_level(logging.DEBUG)

try:
    ssh.connect(
        host=HOST,
        user=USER,
        timeout=TIMEOUT,
        port=PORT,
        # proxycommand="ssh -q -W %h:%p ks2",
    )
except LibsshSessionException as ex:
    print(f"Failed to connect to {HOST}:{PORT} over SSH: {ex!s}")

print(f"ssh.is_connected={ssh.is_connected}")

def run_cmd(ssh_channel, cmd):
    print(f"Running {cmd}...", flush=True)
    print(f"Executing exec_command()...", flush=True)
    cmd_resp = ssh_channel.exec_command(cmd)
    print(f"Executing exec_command()... done", flush=True)
    print(f"stdout type: {type(cmd_resp.stdout)}")
    print(f"stdout:\n{cmd_resp.stdout.decode()}\n")
    print(f"stderr type: {type(cmd_resp.stderr)}")
    print(f"stderr:\n{cmd_resp.stderr.decode()}\n")
    print(f"return code: {cmd_resp.returncode}")

ssh_channel = ssh.new_channel()

run_cmd(ssh_channel, "uptime")
run_cmd(ssh_channel, "ls")
run_cmd(ssh_channel, "hostname")

print(f"Executing ssh_channel.close()...", flush=True)
ssh_channel.close()

print("Closing connection...", flush=True)
ssh.close()
EXPECTED RESULTS

Execution of below script should work all the time, but it code dumps intermittently.

python3 test1.py
ACTUAL RESULTS

Core dumps every now and then. Always on second command. First command always works.

$ python3 test1.py  
ssh.is_connected=1
Running uptime...
Executing exec_command()...
Executing exec_command()... done
stdout type: <class 'bytes'>
stdout:
 7:17PM  up 1 day, 15:59, 0 users, load averages: 0.01, 0.01, 0.00

stderr type: <class 'bytes'>
stderr:

return code: 0
Running ls...
Executing exec_command()...
Segmentation fault (core dumped) 
kucharskim commented 2 months ago

When I enable logging via ssh.set_log_level(logging.DEBUG) (as commented out above in test1.py), I see this:

...
[2024/09/21 07:09:35.914803, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:35.914841, 3] packet_send2:  packet: wrote [type=50, len=1136, padding_size=5, comp=1130, payload=1130]
[2024/09/21 07:09:35.965620, 3] ssh_packet_socket_callback:  packet: read type 52 [len=8,padding=6,comp=1,payload=1]
[2024/09/21 07:09:35.965631, 3] ssh_packet_process:  Dispatching handler for packet type 52
[2024/09/21 07:09:35.965635, 3] ssh_packet_userauth_success:  Authentication successful
[2024/09/21 07:09:35.965639, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=221, in_blocks=107]
ssh.is_connected=1
[2024/09/21 07:09:35.966108, 2] channel_open:  Creating a channel 43 with 64000 window and 32768 max packet
[2024/09/21 07:09:35.966114, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=224, in_blocks=110]
[2024/09/21 07:09:35.966133, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:35.966142, 3] packet_send2:  packet: wrote [type=90, len=32, padding_size=7, comp=24, payload=24]
[2024/09/21 07:09:35.966146, 3] channel_open:  Sent a SSH_MSG_CHANNEL_OPEN type session for channel 43
[2024/09/21 07:09:36.000845, 3] ssh_packet_socket_callback:  packet: read type 80 [len=480,padding=4,comp=475,payload=475]
[2024/09/21 07:09:36.000860, 3] ssh_packet_process:  Dispatching handler for packet type 80
[2024/09/21 07:09:36.000864, 2] ssh_packet_global_request:  Received SSH_MSG_GLOBAL_REQUEST packet
[2024/09/21 07:09:36.000871, 2] ssh_packet_global_request:  UNKNOWN SSH_MSG_GLOBAL_REQUEST hostkeys-00@openssh.com, want_reply = 0
[2024/09/21 07:09:36.000874, 3] ssh_packet_global_request:  The requester doesn't want to know the request failed!
[2024/09/21 07:09:36.000879, 1] ssh_packet_global_request:  Invalid SSH_MSG_GLOBAL_REQUEST packet
[2024/09/21 07:09:36.000882, 3] ssh_packet_socket_callback:  Processing 280 bytes left in socket buffer
[2024/09/21 07:09:36.000889, 3] ssh_packet_socket_callback:  packet: read type 4 [len=120,padding=7,comp=112,payload=112]
[2024/09/21 07:09:36.000892, 3] ssh_packet_process:  Dispatching handler for packet type 4
[2024/09/21 07:09:36.000895, 2] ssh_packet_ignore_callback:  Received SSH_MSG_DEBUG packet
[2024/09/21 07:09:36.000898, 3] ssh_packet_socket_callback:  Processing 140 bytes left in socket buffer
[2024/09/21 07:09:36.000902, 3] ssh_packet_socket_callback:  packet: read type 4 [len=120,padding=7,comp=112,payload=112]
[2024/09/21 07:09:36.000905, 3] ssh_packet_process:  Dispatching handler for packet type 4
[2024/09/21 07:09:36.000908, 2] ssh_packet_ignore_callback:  Received SSH_MSG_DEBUG packet
[2024/09/21 07:09:36.000911, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=224, in_blocks=194]
[2024/09/21 07:09:36.000914, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=224, in_blocks=194]
[2024/09/21 07:09:36.000917, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=224, in_blocks=194]
[2024/09/21 07:09:36.228678, 3] ssh_packet_socket_callback:  packet: read type 91 [len=24,padding=6,comp=17,payload=17]
[2024/09/21 07:09:36.228691, 3] ssh_packet_process:  Dispatching handler for packet type 91
[2024/09/21 07:09:36.228695, 3] ssh_packet_channel_open_conf:  Received SSH2_MSG_CHANNEL_OPEN_CONFIRMATION
[2024/09/21 07:09:36.228699, 2] ssh_packet_channel_open_conf:  Received a CHANNEL_OPEN_CONFIRMATION for channel 43:0
[2024/09/21 07:09:36.228702, 2] ssh_packet_channel_open_conf:  Remote window : 0, maxpacket : 32768
[2024/09/21 07:09:36.228706, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=224, in_blocks=196]
Running uptime...
Executing exec_command()...
[2024/09/21 07:09:36.228773, 2] channel_open:  Creating a channel 44 with 64000 window and 32768 max packet
[2024/09/21 07:09:36.228777, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=227, in_blocks=199]
[2024/09/21 07:09:36.228793, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:36.228797, 3] packet_send2:  packet: wrote [type=90, len=32, padding_size=7, comp=24, payload=24]
[2024/09/21 07:09:36.228800, 3] channel_open:  Sent a SSH_MSG_CHANNEL_OPEN type session for channel 44
[2024/09/21 07:09:36.263447, 3] ssh_packet_socket_callback:  packet: read type 91 [len=24,padding=6,comp=17,payload=17]
[2024/09/21 07:09:36.263460, 3] ssh_packet_process:  Dispatching handler for packet type 91
[2024/09/21 07:09:36.263464, 3] ssh_packet_channel_open_conf:  Received SSH2_MSG_CHANNEL_OPEN_CONFIRMATION
[2024/09/21 07:09:36.263467, 2] ssh_packet_channel_open_conf:  Received a CHANNEL_OPEN_CONFIRMATION for channel 44:1
[2024/09/21 07:09:36.263470, 2] ssh_packet_channel_open_conf:  Remote window : 0, maxpacket : 32768
[2024/09/21 07:09:36.263476, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=227, in_blocks=198]
[2024/09/21 07:09:36.263494, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=230, in_blocks=201]
[2024/09/21 07:09:36.263514, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:36.263518, 3] packet_send2:  packet: wrote [type=98, len=32, padding_size=7, comp=24, payload=24]
[2024/09/21 07:09:36.263522, 3] channel_request:  Sent a SSH_MSG_CHANNEL_REQUEST exec
[2024/09/21 07:09:36.298556, 3] ssh_packet_socket_callback:  packet: read type 93 [len=16,padding=6,comp=9,payload=9]
[2024/09/21 07:09:36.298568, 3] ssh_packet_process:  Dispatching handler for packet type 93
[2024/09/21 07:09:36.298573, 2] channel_rcv_change_window:  Adding 2097152 bytes to channel (44:1) (from 0 bytes)
[2024/09/21 07:09:36.298576, 3] ssh_packet_socket_callback:  Processing 36 bytes left in socket buffer
[2024/09/21 07:09:36.298581, 3] ssh_packet_socket_callback:  packet: read type 99 [len=16,padding=10,comp=5,payload=5]
[2024/09/21 07:09:36.298583, 3] ssh_packet_process:  Dispatching handler for packet type 99
[2024/09/21 07:09:36.298587, 3] ssh_packet_channel_success:  Received SSH_CHANNEL_SUCCESS on channel (44:1)
[2024/09/21 07:09:36.298590, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=230, in_blocks=199]
[2024/09/21 07:09:36.298593, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=230, in_blocks=199]
[2024/09/21 07:09:36.298597, 2] channel_request:  Channel request exec success
[2024/09/21 07:09:36.298629, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=230, in_blocks=199]
[2024/09/21 07:09:36.298647, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:36.298651, 3] packet_send2:  packet: wrote [type=96, len=16, padding_size=10, comp=5, payload=5]
[2024/09/21 07:09:36.298654, 3] ssh_channel_send_eof:  Sent a EOF on client channel (44:1)
[2024/09/21 07:09:36.303498, 3] ssh_packet_socket_callback:  packet: read type 94 [len=88,padding=11,comp=76,payload=76]
[2024/09/21 07:09:36.303512, 3] ssh_packet_process:  Dispatching handler for packet type 94
[2024/09/21 07:09:36.303517, 3] channel_rcv_data:  Channel receiving 67 bytes data in 0 (local win=64000 remote win=2097152)
[2024/09/21 07:09:36.303521, 3] channel_default_bufferize:  placing 67 bytes into channel buffer (stdout)
[2024/09/21 07:09:36.303527, 3] channel_rcv_data:  Channel windows are now (local win=63933 remote win=2097152)
[2024/09/21 07:09:36.303538, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=231, in_blocks=209]
[2024/09/21 07:09:36.303548, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:36.303552, 3] packet_send2:  packet: wrote [type=93, len=16, padding_size=6, comp=9, payload=9]
[2024/09/21 07:09:36.303556, 2] grow_window:  growing window (channel 44:1) to 1280000 bytes
[2024/09/21 07:09:36.303559, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=231, in_blocks=208]
[2024/09/21 07:09:36.303694, 3] ssh_packet_socket_callback:  packet: read type 98 [len=32,padding=6,comp=25,payload=25]
[2024/09/21 07:09:36.303702, 3] ssh_packet_process:  Dispatching handler for packet type 98
[2024/09/21 07:09:36.303708, 3] channel_rcv_request:  received exit-status 0
[2024/09/21 07:09:36.303711, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=231, in_blocks=211]
[2024/09/21 07:09:36.303717, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=231, in_blocks=211]
[2024/09/21 07:09:36.303724, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
[2024/09/21 07:09:36.303727, 3] packet_send2:  packet: wrote [type=97, len=16, padding_size=10, comp=5, payload=5]
[2024/09/21 07:09:36.303731, 3] ssh_channel_close:  Sent a close on client channel (44:1)
Executing exec_command()... done
stdout type: <class 'bytes'>
stdout:
 7:09AM  up 2 days,  3:50, 1 user, load averages: 0.06, 0.03, 0.00

stderr type: <class 'bytes'>
stderr:

return code: 0
Running ls...
Executing exec_command()...
[2024/09/21 07:09:36.303797, 2] channel_open:  Creating a channel 45 with 64000 window and 32768 max packet
[2024/09/21 07:09:36.303802, 3] ssh_packet_need_rekey:  rekey: [data_rekey_needed=0, out_blocks=234, in_blocks=214]
[2024/09/21 07:09:36.303808, 3] packet_send2:  packet: wrote [type=90, len=32, padding_size=7, comp=24, payload=24]
[2024/09/21 07:09:36.303811, 3] channel_open:  Sent a SSH_MSG_CHANNEL_OPEN type session for channel 45
[2024/09/21 07:09:36.303819, 3] ssh_socket_unbuffered_write:  Enabling POLLOUT for socket
Segmentation fault (core dumped) 
kucharskim commented 2 months ago

Version of remote SSH daemon (if that makes any difference):

$ nc -w5 -4 -v examplemachine1 22 
Connection to examplemachine1 (xxx.xxx.xxx.xxx) 22 port [tcp/ssh] succeeded!
SSH-2.0-OpenSSH_9.7
webknjaz commented 2 months ago

@Jakuje ideas?

Jakuje commented 1 month ago

Having log level trace output (see #597) would help to investigate the issue. From the current debug log it is not clear what is going on. I can just guess from the backtrace, that after the first channel got closed, either the callback or something is probing the structure that might have been freed. Is there way to install debuginfo on openbsd to see some more information through gdb about variables that make it crash or does it look like crashing inside the cpython?

kucharskim commented 1 week ago

I didn't prepare OpenBSD package with #597 yet, but I have plan to do it. For now I have vanilla version ansible-libssh 1.2.2 with libssh 0.10.6 with added debug symbols for libssh and python3.11 (from official OpenBSD packages):

$ egdb -quiet -batch -x gdb-commads.txt -e /usr/local/bin/python3.11 -c python3.11.core
[New process 600414]
Core was generated by `python3.11'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000012fabb1f6a8 in __pyx_type_11pylibsshext_7channel_Channel () from /usr/local/lib/python3.11/site-packages/pylibsshext/channel.cpython-311.so
+show version
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-unknown-openbsd7.6".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
+thread apply all bt

Thread 1 (process 600414):
+bt
#0  0x0000012fabb1f6a8 in __pyx_type_11pylibsshext_7channel_Channel () from /usr/local/lib/python3.11/site-packages/pylibsshext/channel.cpython-311.so
#1  0x0000012fa7ba16c4 in ssh_packet_socket_controlflow_callback (code=<optimized out>, userdata=0x12f87a15000) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/packet.c:1443
#2  0x0000012fa7bad42b in ssh_socket_pollcallback (p=0x12f94994de0, fd=3, revents=<optimized out>, v_s=0x12f87a1f370) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/socket.c:386
#3  0x0000012fa7ba9365 in ssh_poll_ctx_dopoll (ctx=0x12f94969030, timeout=<optimized out>) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/poll.c:743
#4  0x0000012fa7baabb6 in ssh_handle_packets (session=0x12f87a15000, timeout=<optimized out>) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/session.c:686
#5  0x0000012fa7baa95b in ssh_handle_packets_termination (session=0x12f87a15000, timeout=<optimized out>, fct=0x12fa7b8b560 <ssh_channel_open_termination>, user=0x12f9498e2a0) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/session.c:755
#6  0x0000012fa7b88195 in channel_open (channel=0x12f9498e2a0, type=0x12fa7b6ac84 "session", window=64000, maxpacket=32768, payload=0x0) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/channels.c:364
#7  0x0000012fabb173ac in __pyx_pf_11pylibsshext_7channel_7Channel_24exec_command (__pyx_v_self=0x12fae44d080, __pyx_v_command=<error reading variable: Cannot access memory at address 0x0>) at /tmp/.tmp-ansible-pylibssh-pep517-6bgc_omf/src/src/pylibsshext/channel.c:7221
#8  __pyx_pw_11pylibsshext_7channel_7Channel_25exec_command (__pyx_v_self=0x12fae44d080, __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, __pyx_kwds=<optimized out>) at /tmp/.tmp-ansible-pylibssh-pep517-6bgc_omf/src/src/pylibsshext/channel.c:7138
#9  0x000001307207f64d in _PyObject_VectorcallTstate (tstate=0x130723db9e0 <_PyRuntime+166184>, callable=0x12fae4236b0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ./Include/internal/pycore_call.h:92
#10 PyObject_Vectorcall (callable=0x12fae4236b0, args=0x12f9498e690, nargsf=2097152, kwnames=0x12fabb1f6a8 <__pyx_type_11pylibsshext_7channel_Channel>) at Objects/call.c:299
#11 0x00000130721981cd in _PyEval_EvalFrameDefault (tstate=0x130723db9e0 <_PyRuntime+166184>, frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:7301
#12 0x0000013072187b1e in _PyEval_EvalFrame (tstate=0x130723db9e0 <_PyRuntime+166184>, frame=0x1305e76a020, throwflag=<error reading variable: Cannot access memory at address 0x0>) at ./Include/internal/pycore_ceval.h:73
#13 _PyEval_Vector (tstate=0x130723db9e0 <_PyRuntime+166184>, func=0x13047719f80, locals=<optimized out>, args=<error reading variable: Cannot access memory at address 0x0>, argcount=<error reading variable: Cannot access memory at address 0x0>, kwnames=<error reading variable: Cannot access memory at address 0x0>) at Python/ceval.c:6434
#14 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:1148
#15 0x00000130721f86e9 in run_eval_code_obj (tstate=0x130723db9e0 <_PyRuntime+166184>, co=0x1302c1d1600, globals=0x130477403c0, locals=0x130477403c0) at Python/pythonrun.c:1741
#16 run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x130477403c0, locals=0x130477403c0, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1762
#17 0x00000130721f81d5 in pyrun_file (fp=0x12fdaf20dd0 <usual>, filename=0x13066c65990, start=<error reading variable: Cannot access memory at address 0x101>, globals=0x130477403c0, locals=0x130477403c0, closeit=1, flags=0x7b188c854720) at Python/pythonrun.c:1657
#18 _PyRun_SimpleFileObject (fp=0x12fdaf20dd0 <usual>, filename=0x13066c65990, closeit=1, flags=0x7b188c854720) at Python/pythonrun.c:440
#19 0x00000130721f73b3 in _PyRun_AnyFileObject (fp=0x12fdaf20dd0 <usual>, filename=0x13066c65990, closeit=1, flags=0x7b188c854720) at Python/pythonrun.c:79
#20 0x0000013072221172 in pymain_run_file_obj (program_name=0x130477405b0, filename=0x13066c65990, skip_source_first_line=0) at Modules/main.c:360
#21 pymain_run_file (config=<optimized out>) at Modules/main.c:379
#22 pymain_run_python (exitcode=<optimized out>) at Modules/main.c:605
#23 Py_RunMain () at Modules/main.c:684
#24 0x0000013072222095 in pymain_main (args=0x7b188c854a38) at Modules/main.c:714
#25 0x00000130722224bc in Py_BytesMain (argc=<optimized out>, argv=0x12f9498e690) at Modules/main.c:738
#26 0x0000012d736e594b in ?? ()
#27 0x0000012d736e5820 in ?? ()
#28 0x0000000000000000 in ?? ()
+info threads
  Id   Target Id         Frame 
* 1    process 600414    0x0000012fabb1f6a8 in __pyx_type_11pylibsshext_7channel_Channel () from /usr/local/lib/python3.11/site-packages/pylibsshext/channel.cpython-311.so
+info locals
No symbol table info available.
kucharskim commented 1 week ago

I built package based on https://github.com/Jakuje/pylibssh/commit/6d1f46762053f27c1861e3319a0de459b67fc958

Ran test3.py as follows:

$ diff -u test1.py test3.py
--- test1.py    Wed Nov 13 11:14:30 2024
+++ test3.py    Wed Nov 13 12:23:26 2024
@@ -1,5 +1,6 @@
 #!/usr/bin/env python3

+from pylibsshext.logging import ANSIBLE_PYLIBSSH_TRACE
 from pylibsshext.errors import LibsshSessionException
 from pylibsshext.session import Session

@@ -12,7 +13,7 @@

 ssh = Session()

-# ssh.set_log_level(logging.DEBUG)
+ssh.set_log_level(ANSIBLE_PYLIBSSH_TRACE)

 try:
     ssh.connect(

output is as follows:

$ python3 test3.py
b'ssh_connect_host_nonblocking: Failed to connect: No route to host'
b'socket_callback_connected: Socket connection callback: 1 (0)'
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_known_hosts_read_entries: Failed to open the known_hosts file '/etc/ssh/ssh_known_hosts': No such file or directory"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_packet_userauth_failure: Access denied for 'none'. Authentication that can continue: publickey,password,keyboard-interactive"
b'ssh_agent_get_ident_count: Answer type: 12, expected answer: 12'
ssh.is_connected=1
b'ssh_packet_global_request: Invalid SSH_MSG_GLOBAL_REQUEST packet'
Running uptime...
Executing exec_command()...
Executing exec_command()... done
stdout type: <class 'bytes'>
stdout:
12:25PM  up 11 days,  9:47, 0 users, load averages: 0.00, 0.00, 0.00

stderr type: <class 'bytes'>
stderr:

return code: 0
Running ls...
Executing exec_command()...
Segmentation fault (core dumped) 

trace output is:

...
+thread apply all bt

Thread 1 (process 168679):
+bt
#0  0x00000200f3e3be20 in ?? ()
#1  0x0000020091bf276a in channel_rcv_eof (session=<optimized out>, type=<optimized out>, packet=<optimized out>, user=<optimized out>) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/channels.c:653
#2  0x0000020091c0c2b6 in ssh_packet_process (session=0x20122bc5000, type=96 '`') at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/packet.c:1546
#3  0x0000020091c0bc2b in ssh_packet_socket_callback (data=<optimized out>, receivedlen=72, user=0x20122bc5000) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/packet.c:1373
#4  0x0000020091c182ac in ssh_socket_pollcallback (p=<optimized out>, fd=3, revents=<optimized out>, v_s=0x2014874c370) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/socket.c:336
#5  0x0000020091c14365 in ssh_poll_ctx_dopoll (ctx=0x20148758b40, timeout=<optimized out>) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/poll.c:743
#6  0x0000020091c15bb6 in ssh_handle_packets (session=0x20122bc5000, timeout=<optimized out>) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/session.c:686
#7  0x0000020091c1595b in ssh_handle_packets_termination (session=0x20122bc5000, timeout=<optimized out>, fct=0x20091bf6560 <ssh_channel_open_termination>, user=0x201487271c0) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/session.c:755
#8  0x0000020091bf3195 in channel_open (channel=0x201487271c0, type=0x20091bd5c84 "session", window=64000, maxpacket=32768, payload=0x0) at /usr/obj/ports/libssh-0.10.6/libssh-0.10.6/src/channels.c:364
#9  0x00000200b7bee3ac in __pyx_pf_11pylibsshext_7channel_7Channel_24exec_command (__pyx_v_self=0x2015ff2be80, __pyx_v_command=<error reading variable: Cannot access memory at address 0x0>) at /tmp/.tmp-ansible-pylibssh-pep517-x_56pbhn/src/src/pylibsshext/channel.c:7221
#10 __pyx_pw_11pylibsshext_7channel_7Channel_25exec_command (__pyx_v_self=0x2015ff2be80, __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, __pyx_kwds=<optimized out>) at /tmp/.tmp-ansible-pylibssh-pep517-x_56pbhn/src/src/pylibsshext/channel.c:7138
#11 0x00000200c9bcd64d in _PyObject_VectorcallTstate (tstate=0x200c9f299e0 <_PyRuntime+166184>, callable=0x200c56679f0, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ./Include/internal/pycore_call.h:92
#12 PyObject_Vectorcall (callable=0x200c56679f0, args=0x20148746770, nargsf=0, kwnames=0x200f3e95ae0) at Objects/call.c:299
#13 0x00000200c9ce61cd in _PyEval_EvalFrameDefault (tstate=0x200c9f299e0 <_PyRuntime+166184>, frame=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:7301
#14 0x00000200c9cd5b1e in _PyEval_EvalFrame (tstate=0x200c9f299e0 <_PyRuntime+166184>, frame=0x200c4bdc020, throwflag=<error reading variable: Cannot access memory at address 0x0>) at ./Include/internal/pycore_ceval.h:73
#15 _PyEval_Vector (tstate=0x200c9f299e0 <_PyRuntime+166184>, func=0x2015ff05f80, locals=<optimized out>, args=<error reading variable: Cannot access memory at address 0x0>, argcount=<error reading variable: Cannot access memory at address 0x0>, kwnames=<error reading variable: Cannot access memory at address 0x0>) at Python/ceval.c:6434
#16 PyEval_EvalCode (co=<optimized out>, globals=<optimized out>, locals=<optimized out>) at Python/ceval.c:1148
#17 0x00000200c9d466e9 in run_eval_code_obj (tstate=0x200c9f299e0 <_PyRuntime+166184>, co=0x200c1eaa300, globals=0x2015ff2c380, locals=0x2015ff2c380) at Python/pythonrun.c:1741
#18 run_mod (mod=<optimized out>, filename=<optimized out>, globals=0x2015ff2c380, locals=0x2015ff2c380, flags=<optimized out>, arena=<optimized out>) at Python/pythonrun.c:1762
#19 0x00000200c9d461d5 in pyrun_file (fp=0x20078464dd0 <usual>, filename=0x200f3e29a00, start=<error reading variable: Cannot access memory at address 0x101>, globals=0x2015ff2c380, locals=0x2015ff2c380, closeit=1, flags=0x7b3df2df8810) at Python/pythonrun.c:1657
#20 _PyRun_SimpleFileObject (fp=0x20078464dd0 <usual>, filename=0x200f3e29a00, closeit=1, flags=0x7b3df2df8810) at Python/pythonrun.c:440
#21 0x00000200c9d453b3 in _PyRun_AnyFileObject (fp=0x20078464dd0 <usual>, filename=0x200f3e29a00, closeit=1, flags=0x7b3df2df8810) at Python/pythonrun.c:79
#22 0x00000200c9d6f172 in pymain_run_file_obj (program_name=0x200f3dd5830, filename=0x200f3e29a00, skip_source_first_line=0) at Modules/main.c:360
#23 pymain_run_file (config=<optimized out>) at Modules/main.c:379
#24 pymain_run_python (exitcode=<optimized out>) at Modules/main.c:605
#25 Py_RunMain () at Modules/main.c:684
#26 0x00000200c9d70095 in pymain_main (args=0x7b3df2df8b28) at Modules/main.c:714
#27 0x00000200c9d704bc in Py_BytesMain (argc=<optimized out>, argv=0x20148746770) at Modules/main.c:738
#28 0x000001fe70cfb94b in ?? ()
#29 0x000001fe70cfb820 in ?? ()
#30 0x0000000000000000 in ?? ()
+info threads
  Id   Target Id         Frame 
* 1    process 168679    0x00000200f3e3be20 in ?? ()
+info locals
No symbol table info available.
Jakuje commented 1 week ago

One more way to get debug logs from libssh is to use LogLevel DEBUG3 in ~/.ssh/config as libssh should parse the openssh configuration files and follow instructions in them.

kucharskim commented 1 week ago

Not sure does changing logging level via Python's ssh.set_log_level(logging.DEBUG) or ssh's LogLevel DEBUG3 work in any way with https://github.com/Jakuje/pylibssh/commit/6d1f46762053f27c1861e3319a0de459b67fc958

$ grep -i log ~/.ssh/config
LogLevel DEBUG3
$ diff -u test1.py test4.py  
--- test1.py    Wed Nov 13 11:14:30 2024
+++ test4.py    Wed Nov 13 12:34:58 2024
@@ -12,7 +12,7 @@

 ssh = Session()

-# ssh.set_log_level(logging.DEBUG)
+ssh.set_log_level(logging.DEBUG)

 try:
     ssh.connect(

output is still the same:

$ python3 test4.py
b'ssh_connect_host_nonblocking: Failed to connect: No route to host'
b'socket_callback_connected: Socket connection callback: 1 (0)'
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_known_hosts_read_entries: Failed to open the known_hosts file '/etc/ssh/ssh_known_hosts': No such file or directory"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_key_cmp: key types don't match!"
b"ssh_packet_userauth_failure: Access denied for 'none'. Authentication that can continue: publickey,password,keyboard-interactive"
b'ssh_agent_get_ident_count: Answer type: 12, expected answer: 12'
ssh.is_connected=1
b'ssh_packet_global_request: Invalid SSH_MSG_GLOBAL_REQUEST packet'
Running uptime...
Executing exec_command()...
Executing exec_command()... done
stdout type: <class 'bytes'>
stdout:
12:40PM  up 11 days, 10:02, 0 users, load averages: 0.05, 0.05, 0.00

stderr type: <class 'bytes'>
stderr:

return code: 0
Running ls...
Executing exec_command()...
Segmentation fault (core dumped) 
kucharskim commented 1 week ago

I reverted to ansible-libssh 1.2.2 and now logging works as expected, but output is too long to paste here, so attaching a file test4-output-ansible-libssh-1.2.2-v001.txt

kucharskim commented 1 week ago

It's not visible in the attached file but it did ended up with Segmentation fault (core dumped)

Jakuje commented 1 week ago

I think I see the issue (sorry I did not notice it at first). The SSH channels do not allow running multiple commands in them as they are closed after the command execution. You need to allocate a new one for next command or handle the IO in the shell yourself. I think the following should do:

def run_cmd(ssh, cmd):
    ssh_channel = ssh.new_channel()

    print(f"Running {cmd}...", flush=True)
    print(f"Executing exec_command()...", flush=True)
    cmd_resp = ssh.exec_command(cmd)
    print(f"Executing exec_command()... done", flush=True)
    print(f"stdout type: {type(cmd_resp.stdout)}")
    print(f"stdout:\n{cmd_resp.stdout.decode()}\n")
    print(f"stderr type: {type(cmd_resp.stderr)}")
    print(f"stderr:\n{cmd_resp.stderr.decode()}\n")
    print(f"return code: {cmd_resp.returncode}")

    print(f"Executing ssh_channel.close()...", flush=True)
    ssh_channel.close()

run_cmd(ssh, "uptime")
run_cmd(ssh, "ls")
run_cmd(ssh, "hostname")

(untested)

Certainly the pylibssh/libssh should not crash at this attempt and if this is not clear from the documentation, we should be more explicit about this.

Regarding handling the IO, it would mean opening shell with request_shell() and then handling the input (commands to execute) and output (their output and stderr) yourself. This has indeed the disadvantage that you can not easily get the exit code from separate command.

Jakuje commented 1 week ago

The reference where this is described in the RFC 4254:

Once the session has been set up, a program is started at the remote end. The program can be a shell, an application program, or a subsystem with a host-independent name. Only one of these requests can succeed per channel.

https://datatracker.ietf.org/doc/html/rfc4254#section-6.5

Jakuje commented 1 week ago

Hmm. There is test test_exec_command() in tests/unit/channel_test.py, that tests exactly this and is working. But testing the same thing with libssh directly indeed makes the second ssh_channel_request_exec() failing. Will try to dig into that tomorrow.

kucharskim commented 1 week ago

The SSH channels do not allow running multiple commands in them as they are closed after the command execution.

I also suspected that and have another version of my Python script, but that also dumps core. However I wanted to open a new GitHub issue for that. I am not sure which way for you is better, keep it here or open a new one. I'll open a new one, but then it can be marked as duplicated and you can bring discussion back here.

kucharskim commented 1 week ago

I've also opened https://github.com/ansible/pylibssh/issues/657 for another core dump, when multiple channels are opened.

Jakuje commented 1 week ago

My bad, the exec_command() wraps the channel creation and call to request_exec() in itself so disregard my previous comments. Not sure then why the function is implemented on top of existing channel that is not used at all.

So the following test basically does all what you do:

https://github.com/ansible/pylibssh/blob/cc2ceff272d390f1b95cbe97dbd3770ec8db67c6/tests/unit/channel_test.py#L42-L48

This test is present for 3 years since #280 when this code was introduced and I think it was not crashing in Linux builds so I am wondering if this is something specific for OpenBSD. Do you happen to be able to test your code on different platform to pinpoint it to something specific to OpenBSD?

Jakuje commented 1 week ago

Ok, reading further it test is flaky and sometimes segfaults as described in #57 and they have reports also from ubuntu and macos. Let me see if I can reproduce it.

kucharskim commented 1 week ago

Outside of this issue, I am wondering should exec_command() function be deprecated on Channel class and be implemented / moved into the Session class, if each exec_command() opens new channel for each execution anyway. Or, should exec_command() stay under Channel class but use instance of the channel on which it is executed (conceptually reverting approach from #280), to stop opening new channel under the hood, and then allow to execute exec_command() only once and subsequent executions should throw exception with explanation, that this is not allowed.

Jakuje commented 1 week ago

Outside of this issue, I am wondering should exec_command() function be deprecated on Channel class and be implemented / moved into the Session class, if each exec_command() opens new channel for each execution anyway.

That would make most sense for me. The current location of this function is confusing.

Or, should exec_command() stay under Channel class but use instance of the channel on which it is executed (conceptually reverting approach from #280), to stop opening new channel under the hood, and then allow to execute exec_command() only once and subsequent executions should throw exception with explanation, that this is not allowed.

I think this would break existing applications, which we will not want to do.

Jakuje commented 1 week ago

I think the issue will live in the following code:

https://github.com/ansible/pylibssh/blob/cc2ceff272d390f1b95cbe97dbd3770ec8db67c6/src/pylibsshext/channel.pyx#L169-L174

the callback structure cb is local variable in the function so the channel structure that is left around in the session (might be a bit unexpected after libssh.ssh_channel_free(), but if the close confirmation is not delivered in time for the free, it is kept around). So consecutive calls (if there is nothing in between) will likely work, but if there are some other function calls on the stack, the memory will likely get changed and the callbacks on the old channel will crash.

kucharskim commented 1 week ago

I've packaged https://github.com/Jakuje/pylibssh/commit/8c72faa2ece844e0b101ebd1283f1c64a75ad1fa to test your changes on OpenBSD and I cannot reproduce core dump anymore with test1.py from this GitHub issue nor with test2.py from https://github.com/ansible/pylibssh/issues/657

Jakuje commented 1 week ago

Thank you for testing! Good to hear that it worked for you! Will leave it up to @webknjaz to review and help with the python side as I am more C programmer.

Jakuje commented 1 week ago

Having slept over it, I think the libssh is also a bit to blame. The delayed freeing of the channels sound like a good idea, but it results in these issues (we have also some random failures in CI, but not that common like here). I think the libssh should be changed to not invoke callbacks on channels user explicitly freed. If we keep them around for a reasons, we should not assume the user kept the callbacks around. I will submit a MR to fix that later today.

Jakuje commented 1 week ago

Could you try that with the top commit from https://gitlab.com/libssh/libssh-mirror/-/merge_requests/549 and original pylibssh if it will still crash?

kucharskim commented 6 days ago

I needed to modify the patch a bit to make it apply to libssh 0.10.6

Index: src/channels.c
--- src/channels.c.orig
+++ src/channels.c
@@ -1212,6 +1212,11 @@ void ssh_channel_free(ssh_channel channel)
     }
     channel->flags |= SSH_CHANNEL_FLAG_FREED_LOCAL;

+    if (channel->callbacks != NULL) {
+        ssh_list_free(channel->callbacks);
+        channel->callbacks = NULL;
+    }
+
     /* The idea behind the flags is the following : it is well possible
      * that a client closes a channel that still exists on the server side.
      * We definitively close the channel when we receive a close message *and*
@@ -1240,11 +1245,6 @@ void ssh_channel_do_free(ssh_channel channel)

     SSH_BUFFER_FREE(channel->stdout_buffer);
     SSH_BUFFER_FREE(channel->stderr_buffer);
-
-    if (channel->callbacks != NULL) {
-        ssh_list_free(channel->callbacks);
-        channel->callbacks = NULL;
-    }

     channel->session = NULL;
     SAFE_FREE(channel);

and I cannot reproduce core dump with both test1.py from this pull request nor from test2.py from https://github.com/ansible/pylibssh/issues/657

Jakuje commented 6 days ago

Thank you for testing that!

I updated the patch to keep freeing in both places as in rare occasions, it could happen that the callbacks could be assigned even after the shell is freed (which is even more awkward conditions and we should probably prevent it too, but it sounds like our tests caught some occasions of this with valgrind).

kucharskim commented 6 days ago

I've tested !549 (c40a1a16) merge request in this GitHub issue and I cannot reproduce core dump with both test1.py from this pull request nor from test2.py from https://github.com/ansible/pylibssh/issues/657. All good!