containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
22.36k stars 2.31k forks source link

[v4.9-rhel] Ensure that containers do not get stuck in stopping #23088

Closed TomSweeneyRedHat closed 2 days ago

TomSweeneyRedHat commented 6 days ago

The scenario for inducing this is as follows:

  1. Start a container with a long stop timeout and a PID1 that ignores SIGTERM
  2. Use podman stop to stop that container
  3. Simultaneously, in another terminal, kill -9 pidof podman (the container is now in ContainerStateStopping)
  4. Now kill that container's Conmon with SIGKILL.
  5. No commands are able to move the container from Stopping to Stopped now.

The cause is a logic bug in our exit-file handling logic. Conmon being dead without an exit file causes no change to the state. Add handling for this case that tries to clean up, including stopping the container if it still seems to be running.

Fixes #19629

Addresses: https://issues.redhat.com/browse/ACCELFIX-250

Does this PR introduce a user-facing change?

None
TomSweeneyRedHat commented 6 days ago

Added the hold to make sure we get a Jira Card assigned and attached to this.

TomSweeneyRedHat commented 4 days ago

@edsantiago and/or @cevich I keep running into the below error on the "Validate rawhide Build" test. Other than pressing "try again", is there anything else to do?

Repositories loaded.
Failed to resolve the transaction:
Problem: package python3-libs-3.12.0-2.fc40.i686 requires libtirpc.so.3, but none of the providers can be installed
  - package python3-libs-3.12.0-2.fc40.i686 requires libtirpc.so.3(TIRPC_0.3.0), but none of the providers can be installed
  - package libtirpc-1.3.4-1.rc3.fc41.i686 requires libgssapi_krb5.so.2, but none of the providers can be installed
edsantiago commented 4 days ago

Seems to be this https://github.com/containers/podman/blob/1b049941640a7df3d0fc272f3acb32727b6179b8/contrib/cirrus/setup_environment.sh#L337-L338

Can you try removing that dnf line? And maybe removing it from the other two places that also dnf it? At least that might get you past Validate and onto other tests. (Which of course might fail)

cevich commented 3 days ago

Another suggestion: Remove the rawhide validation (or possibly ALL rawhide everything) from CI, I don't think old-rawhide is useful on a RHEL release branch.

TomSweeneyRedHat commented 3 days ago

Turning off Rawhide seems to have done the trick. Ready for review and happy green tests buttons. Ditto #23089

openshift-ci[bot] commented 2 days ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, TomSweeneyRedHat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/containers/podman/blob/v4.9-rhel/OWNERS)~~ [Luap99,TomSweeneyRedHat] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
TomSweeneyRedHat commented 2 days ago

Addresses: https://issues.redhat.com/browse/RHEL-45531