SFTP Teardown failure - Githubissues

holmanb commented 2 months ago

When collecting logs it looks like the instance teardown path recently (within the last month) started throwing an unhandled exception in the failure path. I see the following traceback in cloud-init integration tests on failure.

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.12/logging/__init__.py", line 1163, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/pycloudlib/instance.py", line 204, in __del__
    self._sftp_client.close()
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp_client.py", line 194, in close
    self._log(INFO, "sftp session closed.")
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp_client.py", line 182, in _log
    super()._log(
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp.py", line 169, in _log
    self.logger.log(level, msg, *args)
Message: '[chan %s] sftp session closed.'
Arguments: ('8',)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.12/logging/__init__.py", line 1163, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp_client.py", line 194, in close
    self._log(INFO, "sftp session closed.")
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp_client.py", line 182, in _log
    super()._log(
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp.py", line 169, in _log
    self.logger.log(level, msg, *args)
Message: '[chan %s] sftp session closed.'
Arguments: ('11',)
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.12/logging/__init__.py", line 1163, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/pycloudlib/instance.py", line 204, in __del__
    self._sftp_client.close()
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp_client.py", line 194, in close
    self._log(INFO, "sftp session closed.")
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp_client.py", line 182, in _log
    super()._log(
  File "/home/runner/work/cloud-init/cloud-init/.tox/integration-tests-ci/lib/python3.12/site-packages/paramiko/sftp.py", line 169, in _log
    self.logger.log(level, msg, *args)
Message: '[chan %s] sftp session closed.'
Arguments: ('9',)

a-dubs commented 2 months ago

So I did some digging, and this seems to be a known issue with pytest prematurely closing the stream for logging: https://github.com/pytest-dev/pytest/issues/5502

Was able to find some other GH projects that had a similar issue and were able to implement a easy fix: https://github.com/scylladb/scylla-cluster-tests/pull/6001

holmanb commented 2 months ago

So I did some digging, and this seems to be a known issue with pytest prematurely closing the stream for logging: pytest-dev/pytest#5502

Was able to find some other GH projects that had a similar issue and were able to implement a easy fix: scylladb/scylla-cluster-tests#6001

Thanks for digging @a-dubs. It's good to know that this option exists, but I'm a little hesitant to just silence logging exceptions, even if it is in the final cleanup. This could hide future bugs.

I think what is happening is the logger is getting torn down by pytest, and then the final integration instance gets garbage collected which triggers __del__(), which then barfs when it tries to log on sftp close.

For now I think that we can just manually call __del__() after each test in integration tests to prevent this shutdown mess, but long term we should probably move this out of __del__(), but that requires API changes.

aciba90 commented 1 month ago

@holmanb, @a-dubs: was this fixed with https://github.com/canonical/cloud-init/pull/5698?

holmanb commented 1 month ago

@holmanb, @a-dubs: was this fixed with canonical/cloud-init#5698?

Locally for me, yes, but I still see it failing in CI so I guess not.

canonical / pycloudlib

SFTP Teardown failure #415