Closed forrejam closed 7 months ago
ah interesting. will look more closely when im back from vacation, but I imagine ive just got something not awaited w/ a timeout in the initial connection since everything else should(in theory!) be timed out esp in the async bits. does this happen w/ sync too? can ya share example script too just for kicks, imagine nothing much to see, but cant hurt.
hm, not sure that I ams seeing this behavior. using a mostly copypasta ssh server example from asyncssh like this:
import asyncio
import sys
from typing import Optional
from asyncssh import SSHServer, SSHServerConnection, create_server, SSHServerSession
from asyncssh.public_key import SSHKey
BASE_SERVER_KEY = b"""-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAlwAAAAdzc2gtcn
NhAAAAAwEAAQAAAIEAwahUv5Tf3vWQzmz2de791K+vy2WQP9q5eOCAIlD2dFb9lTCg3CNl
kJRLMwelj4eJVdfT6YfQjRfbOkuMtGmwz+ed9ulHBVQ8Ee7JuSfxRcazWx2Wet5wzA0vkv
dohzw20jHhmLpbAi/x20Zxv5R+jK3o/+x6ciIW6sYCoQXJw88AAAIQhQL0T4UC9E8AAAAH
c3NoLXJzYQAAAIEAwahUv5Tf3vWQzmz2de791K+vy2WQP9q5eOCAIlD2dFb9lTCg3CNlkJ
RLMwelj4eJVdfT6YfQjRfbOkuMtGmwz+ed9ulHBVQ8Ee7JuSfxRcazWx2Wet5wzA0vkvdo
hzw20jHhmLpbAi/x20Zxv5R+jK3o/+x6ciIW6sYCoQXJw88AAAADAQABAAAAgQCE8ss7uz
j2GCARlzycOjaIjRRizpb5P2+VTIqrBGot9IqioX/NoX9YgnYd0mIW5zWheUpCSLskIfyf
SL6QHP8EinQ2e5VPO8sJ3So/2/a9H58ATAZX5D/Rmzjjbh9S57NqlM4y+tSaMVZbUFq53D
uhw+OTG1skt3aK4icJVoYWAQAAAEAkDdyAeM/njkl8IrcutV0Qz9uttRJ7piVGswZmwSiG
aBOVYrSvjNen23FuXXHErMTQbDSMzvI2njBB1P10rteWAAAAQQDjdvmIZGQgLaDqFIQH89
M0USeQavVKmVOjnHMOSzuKzhXoMEGaRtV6s1z6R+0FSblMju69I8KP9x8hmw8JxBn1AAAA
QQDZ86OGU3St5fz9INTz+x6wcsVVDXTywDjlU8UDrpZN9Y8WBiTSG1aNRm7IZXxOcmEJ7L
CwttJsdhYnN0En/zgzAAAAFGNhcmxAaW1wb3N0b3JlLmxvY2FsAQIDBAUG
-----END OPENSSH PRIVATE KEY-----
"""
class BaseSSHServerSession(SSHServerSession): # type: ignore
def shell_requested(self) -> bool:
"""
Handle shell requested; always return True
Args:
N/A
Returns:
bool: always True!
Raises:
N/A
"""
print("Shell requested")
return True
class MisbehavingServer(SSHServer):
def __init__(self, session):
self.session = session
def connection_made(self, conn: SSHServerConnection) -> None:
print("SSH connection received from %s." % conn.get_extra_info("peername")[0])
def connection_lost(self, exc: Optional[Exception]) -> None:
if exc:
print("SSH connection error: " + str(exc), file=sys.stderr)
else:
print("SSH connection closed.")
def begin_auth(self, username: str) -> bool:
if username != "scrapli":
return False
return True
def password_auth_supported(self) -> bool:
return True
def validate_password(self, username: str, password: str) -> bool:
return password == "scrapli"
def validate_public_key(self, username: str, key: SSHKey) -> bool:
return True
def session_requested(self) -> SSHServerSession: # type: ignore
return self.session()
async def run_server() -> None:
def server_factory() -> SSHServer:
server = MisbehavingServer(session=BaseSSHServerSession)
return server
await create_server(server_factory, "localhost", 8022, server_host_keys=[BASE_SERVER_KEY])
def sync_run_server(loop: Optional[asyncio.base_events.BaseEventLoop] = None) -> None:
if loop is None:
loop = asyncio.get_event_loop()
loop.run_until_complete(run_server())
loop.run_forever()
if __name__ == "__main__":
sync_run_server()
and either sync or async test setup like:
from scrapli import AsyncScrapli
import asyncio
async def main():
dev = {
"host": "localhost",
"port": 8022,
"auth_username": "scrapli",
"auth_password": "scrapli",
"auth_strict_key": False,
"ssh_config_file": False,
"platform": "cisco_iosxe",
"transport": "asyncssh",
"timeout_ops": 2,
}
conn = AsyncScrapli(**dev)
await conn.open()
print("OPENED")
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(main())
is consistently timing out as I would expect it to with the exception scrapli.exceptions.ScrapliTimeout: timed out getting prompt
.
can you share more steps to repro or try w/ this dummy server setup? you can also run the server and ssh to it -- it will handle password auth then just hang forever which I think accurately reproduces what you described, but you can check for yourself to let us know too.
Thanks @carlmontanari, I am having difficulty reproducing this now, the switch stack in question has been rebooted and now does not cause an issue. I was hoping the issue would return and I could do a packet captures but it has been stable for about a week. I will reopen if the issue returns and I can investigate further.
sounds good @forrejam , will close. holler if it comes back!
Not sure how to reopen this issue but I have finally been able to reproduce this and get some debug info. It is very intermittent.
from scrapli.driver.core import AsyncIOSXEDriver
import os
import asyncio
import logging
import asyncssh
asyncssh.set_debug_level(2)
logging.basicConfig()
loggers = [logging.getLogger(name) for name in logging.root.manager.loggerDict]
for logger in loggers:
logger.setLevel(logging.DEBUG)
async_device = {
"host": "switch.company.org",
"auth_username": os.getenv("USER", None),
"auth_password": os.getenv("PASSW", None),
"auth_strict_key": False,
"transport": "asyncssh",
"ssh_config_file": "ssh_config.conf",
}
async def async_main():
conn = AsyncIOSXEDriver(**async_device)
await conn.open()
print(await conn.get_prompt())
if __name__ == "__main__":
asyncio.run(async_main())
This will happen once every 20 runs or so, its not consistent. The asyncssh channel closes and Scrapli will loop forever (I have redacted some sensitive info).
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:scrapli.driver:attempting to resolve 'ssh_config_file' file
DEBUG:scrapli.driver:using 'ssh_config.conf' as resolved 'ssh_config_file' file'
DEBUG:scrapli.driver:load core transport requested
DEBUG:scrapli.driver:core transport 'asyncssh' loaded successfully
DEBUG:scrapli.driver:generating combined network comms prompt pattern
DEBUG:scrapli.driver:setting 'comms_prompt_pattern' value to '(^[\w.\-@/:]{1,63}>$)|(^[\w.\-@/:]{1,63}#$)|(^[\w.\-@/:]{1,63}\([\w.\-@/:+]{0,32}\)#$)|(^([\w.\-@/+>:]+\(tcl\)[>#]|\+>)$)'
INFO:scrapli.driver:opening connection to 'switch.company.org' on port '22'
DEBUG:scrapli.transport:opening transport connection to 'switch.company.org' on port '22'
DEBUG:asyncssh:Reading config from "ssh_config.conf"
INFO:asyncssh:Opening SSH connection to switch.company.org, port 22
INFO:asyncssh:[conn=7] Connected to SSH server at switch.company.org, port 22
INFO:asyncssh:[conn=7] Local address: xxx.xxx.xxx.xxx, port 58058
INFO:asyncssh:[conn=7] Peer address: xxx.xxx.xxx.xxx, port 22
DEBUG:asyncssh:[conn=7] Sending version SSH-2.0-AsyncSSH_2.14.2
DEBUG:asyncssh:[conn=7] Received version SSH-2.0-Cisco-1.25
DEBUG:asyncssh:[conn=7] Requesting key exchange
DEBUG:asyncssh:[conn=7] Key exchange algs: curve25519-sha256,curve25519-sha256@libssh.org,curve448-sha512,ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,ecdh-sha2-1.3.132.0.10,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha256,diffie-hellman-group15-sha512,diffie-hellman-group16-sha512,diffie-hellman-group17-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256@ssh.com,diffie-hellman-group14-sha1,rsa2048-sha256,diffie-hellman-group1-sha1,ext-info-c,kex-strict-c-v00@openssh.com
DEBUG:asyncssh:[conn=7] Host key algs: rsa-sha2-256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,ssh-ed25519-cert-v01@openssh.com,ssh-ed448-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-1.3.132.0.10-cert-v01@openssh.com,rsa-sha2-256,rsa-sha2-512,ssh-rsa-sha224@ssh.com,ssh-rsa-sha256@ssh.com,ssh-rsa-sha384@ssh.com,ssh-rsa-sha512@ssh.com,ssh-rsa,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com,ssh-ed25519,ssh-ed448,ecdsa-sha2-nistp521,ecdsa-sha2-nistp384,ecdsa-sha2-nistp256,ecdsa-sha2-1.3.132.0.10
DEBUG:asyncssh:[conn=7] Encryption algs: 3des-cbc,aes128-cbc,aes192-cbc,aes256-cbc,aes128-ctr,aes192-ctr,aes256-ctr
DEBUG:asyncssh:[conn=7] MAC algs: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-sha256-2@ssh.com,hmac-sha224@ssh.com,hmac-sha256@ssh.com,hmac-sha384@ssh.com,hmac-sha512@ssh.com
DEBUG:asyncssh:[conn=7] Compression algs: zlib@openssh.com,none
DEBUG:asyncssh:[conn=7] Received key exchange request
DEBUG:asyncssh:[conn=7] Key exchange algs: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1
DEBUG:asyncssh:[conn=7] Host key algs: rsa-sha2-512,rsa-sha2-256,ssh-rsa
DEBUG:asyncssh:[conn=7] Client to server:
DEBUG:asyncssh:[conn=7] Encryption algs: aes128-gcm,aes256-gcm,aes128-ctr,aes192-ctr,aes256-ctr
DEBUG:asyncssh:[conn=7] MAC algs: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
DEBUG:asyncssh:[conn=7] Compression algs: none
DEBUG:asyncssh:[conn=7] Server to client:
DEBUG:asyncssh:[conn=7] Encryption algs: aes128-gcm,aes256-gcm,aes128-ctr,aes192-ctr,aes256-ctr
DEBUG:asyncssh:[conn=7] MAC algs: hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
DEBUG:asyncssh:[conn=7] Compression algs: none
DEBUG:asyncssh:[conn=7] Beginning key exchange
DEBUG:asyncssh:[conn=7] Key exchange alg: ecdh-sha2-nistp521
DEBUG:asyncssh:[conn=7] Client to server:
DEBUG:asyncssh:[conn=7] Encryption alg: aes128-ctr
DEBUG:asyncssh:[conn=7] MAC alg: hmac-sha2-256-etm@openssh.com
DEBUG:asyncssh:[conn=7] Compression alg: none
DEBUG:asyncssh:[conn=7] Server to client:
DEBUG:asyncssh:[conn=7] Encryption alg: aes128-ctr
DEBUG:asyncssh:[conn=7] MAC alg: hmac-sha2-256-etm@openssh.com
DEBUG:asyncssh:[conn=7] Compression alg: none
DEBUG:asyncssh:[conn=7] Requesting service ssh-userauth
DEBUG:asyncssh:[conn=7] Completed key exchange
DEBUG:asyncssh:[conn=7] Request for service ssh-userauth accepted
INFO:asyncssh:[conn=7] Beginning auth for user user1
DEBUG:asyncssh:[conn=7] Remaining auth methods: publickey,keyboard-interactive,password
DEBUG:asyncssh:[conn=7] Preferred auth methods: publickey,keyboard-interactive,password
DEBUG:asyncssh:[conn=7] Trying public key auth with ssh-rsa key
DEBUG:asyncssh:[conn=7] Remaining auth methods: publickey,keyboard-interactive,password
DEBUG:asyncssh:[conn=7] Preferred auth methods: publickey,keyboard-interactive,password
DEBUG:asyncssh:[conn=7] Trying keyboard-interactive auth
INFO:asyncssh:[conn=7] Auth for user user1 succeeded
DEBUG:asyncssh:[conn=7, chan=0] Set write buffer limits: low-water=16384, high-water=65536
INFO:asyncssh:[conn=7, chan=0] Requesting new SSH session
DEBUG:asyncssh:[conn=7, chan=0] Initial recv window 2097152, packet size 32768
DEBUG:asyncssh:[conn=7, chan=0] Initial send window 8192, packet size 4096
DEBUG:asyncssh:[conn=7, chan=0] Terminal type: xterm
INFO:asyncssh:[conn=7, chan=0] Interactive shell requested
DEBUG:scrapli.transport:transport connection to 'switch.company.org' on port '22' opened successfully
DEBUG:asyncssh:[conn=7, chan=0] Reading from channel started
DEBUG:scrapli.channel:write: '\n'
DEBUG:asyncssh:[conn=7, chan=0] Sending 1 data byte
INFO:asyncssh:[conn=7, chan=0] Received exit status 0
DEBUG:asyncssh:[conn=7, chan=0] Received EOF
INFO:asyncssh:[conn=7, chan=0] Received channel close
INFO:asyncssh:[conn=7, chan=0] Channel closed
INFO:asyncssh:[conn=7] Connection lost
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
# ...
# loops forever....
# ...
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
DEBUG:scrapli.channel:read: b''
KeyboardInterrupt
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-232' coro=<AsyncChannel.get_prompt() done, defined at /home/user1/dev/venv/lib/python3.10/site-packages/scrapli/channel/async_channel.py:423> exception=KeyboardInterrupt()>
Traceback (most recent call last):
File "/home/user1/dev/play.py", line 63, in <module>
asyncio.run(async_main())
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/home/user1/dev/venv/lib/python3.10/site-packages/scrapli/channel/async_channel.py", line 448, in get_prompt
buf += await self.read()
File "/home/user1/dev/venv/lib/python3.10/site-packages/scrapli/channel/async_channel.py", line 73, in read
self.logger.debug(f"read: {buf!r}")
File "/usr/lib/python3.10/logging/__init__.py", line 1835, in debug
self.log(DEBUG, msg, *args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1879, in log
self.logger.log(level, msg, *args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1547, in log
self._log(level, msg, args, **kwargs)
File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
self.handle(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
self.callHandlers(record)
File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
hdlr.handle(record)
KeyboardInterrupt
Please note I have not been able to reproduce the issue using the default (Synchronous?) driver.
device = {
"host": "device.dns.name",
"auth_username": os.getenv("USER", None),
"auth_password": os.getenv("PASSW", None),
"auth_strict_key": False,
"platform": "cisco_iosxe",
"ssh_config_file": "ssh_config.conf",
}
def main():
conn = Scrapli(**device)
conn.open()
print(conn.get_prompt())
scrapli==2024.1.30 asyncssh==2.14.2
Python 3.10.12
I updated to python 3.12.2 and the issue remained.
I believe what is happening is the underlying transport is closed but still able to be read and it returns instantly. It goes into an infinite loop inside AsyncChannel.read()
overwhelming the async event loop and not allowing the timeout to trigger. Putting a small async sleep in AsyncChannel.read()
allows the event loop to breath and the timeout triggers. This has effectively solved my issue by allowing the timeout_wrapper
decorator to do its job.
I'm not too knowledgeable about the inner workings asyncio.wait_for
. I have a commit with the change https://github.com/forrejam/scrapli/commit/3dc5db7fa60f587e350081826d2215ee3a98f289 . It seems quite dirty but let me know if you have any ideas. I have tested multiple times now and it seems to break out of the infinite read loop correctly now after the default 30 seconds timeout.
Also I'm not sure if the underling transport read is supposed to await until it actually receives data or if and empty b''
is a valid read.
ah interesting! thanks a bunch for all the effort digging into this @forrejam -- much appreciated!
if its not too difficult, could you try to add this line print(f"at eof? -> { self.stdout.at_eof()}")
here?
from the log output (very helpful, thank you!) it seems like asyncssh understands we got the EOF, but because that never raised an exception to us we are not aware of that. so, just quickly looking at what asyncssh public methods are available on the stdout object we read from, it looks like this eof check may be helpful and a quick/easy way ensure we didn't hit an eof which, if it works, we can then raise an exception there and we dont need to have any sleeps or anything.
what do you think?
Thanks @carlmontanari, yes very interesting... I will give this a go on Monday.
at eof? -> True
DEBUG:scrapli.channel:read: b''
at eof? -> True
DEBUG:scrapli.channel:read: b''
at eof? -> True
DEBUG:scrapli.channel:read: b''
at eof? -> True
DEBUG:scrapli.channel:read: b''
at eof? -> True
DEBUG:scrapli.channel:read: b''
at eof? -> True
DEBUG:scrapli.channel:read: b''
Yes indeed stdout is at EOF!
Describe the bug When connecting to a misbehaving cisco switch, Scrapli Async IO driver will hang indefinitely. When SSH using putty, the switch will quickly open the connection and close it a few moments later (ie no prompt or data returned). While I understand the real issue here is the device misbehaving (IOS-XE bug), it would be nice if Scrapli handled this situation. At the moment, I have to check if script has been running too long and kill it manually. I have experienced similar behavior across a few devices over the past years.
To Reproduce Steps to reproduce the behavior:
Expected behavior Expect the timeout or an error on a device that does not respond as expected.
Stack Trace Interrupting with ctrl-c yields:
Screenshots If applicable, add screenshots to help explain your problem, but do note that formatted text is much preferred over screenshots!
OS (please complete the following information):
Additional context Add any other context about the problem here.