containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
22.4k stars 2.31k forks source link

Changing system time during pod creation causes init containers to run in incorrect order #23022

Open andremarianiello opened 2 weeks ago

andremarianiello commented 2 weeks ago

Issue Description

When creating init containers in a pod using podman kube play, they are created in the order specified in the YAML spec. When they are run, the are run in order of creation time (oldest first, newest last). However, in rare circumstances, these orders are not the same. If the system time changes during pod creation, for example a system service pod starts up concurrently with the system time daemon, the system time can change while init containers are being created. If the system time is sent into the past, then the later init containers could have creation times after the earlier init containers, causing them to run first.

This is a big problem if the init containers have to run in order in order to behave correctly.

Here is where containers get their creation time set - https://github.com/containers/podman/blob/afe55cded062ab0c56f57e99002686862f3327c9/libpod/runtime_ctr.go#L212 Here is where containers are retrieved from podman state in creation time order - https://github.com/containers/podman/blob/67bbbb9e94a00a8b5d1d358dfcc8bbd1bd0c9b55/libpod/pod.go#L502-L518 Here is where containers are started in retrieval order - https://github.com/containers/podman/blob/afe55cded062ab0c56f57e99002686862f3327c9/libpod/pod_api.go#L20-L27

Steps to reproduce the issue

Send system time backwards while concurrently creating a pod with multiple init containers.

Describe the results you received

Here is an example:

  1. podman kube play a pod with 4 init containers (A, B, C, D) at 3:00pm
  2. init container A is created at 3:00:01pm
  3. init container B is created at 3:00:02pm
  4. ntpd adjusts system time to 2:00pm
  5. init container C is created at 2:00:01pm
  6. init container D is created at 2:00:02pm
  7. libpod fetches the init containers, ordered by creation time
  8. libpod runs init container C (created at 2:00:01pm)
  9. libpod runs init container D (created at 2:00:02pm)
  10. libpod runs init container A (created at 3:00:01pm)
  11. libpod runs init container B (created at 3:00:02pm)

Describe the results you expected

init containers should run in the specified order - A, then B, then, C, then D

podman info output

[root@nms70 ~]# podman info                                                           
host:                                                                                 
  arch: amd64                                                                         
  buildahVersion: 1.33.7                                                              
  cgroupControllers:                                                                  
  - cpuset                                                                            
  - cpu                                                                               
  - cpuacct                                                                           
  - blkio                                                                             
  - memory                                                                            
  - devices                                                                           
  - freezer                                                                           
  - net_cls                                                                           
  - perf_event                                                                        
  - net_prio                                                                          
  - hugetlb                                                                           
  - pids                                                                              
  - rdma                                                                              
  cgroupManager: systemd                                                              
  cgroupVersion: v1                                                                   
  conmon:                                                                             
    package: conmon-2.1.10-1.module+el8.10.0+21077+98b84d8a.x86_64                    
    path: /usr/bin/conmon                                                             
    version: 'conmon version 2.1.10, commit: 80c4f656297773fb630a4d966add3242abab39a4'
  cpuUtilization:                                                                     
    idlePercent: 87.09                                                                
    systemPercent: 5.32                                                               
    userPercent: 7.59                                                                 
  cpus: 2                                                                             
  databaseBackend: sqlite                                                             
  distribution:                                                                       
    distribution: rhel                                                                
    version: "8.10"                                                                   
  eventLogger: file                                                                   
  freeLocks: 2000                                                                     
  hostname: nms70                                                                     
  idMappings:                                                                         
    gidmap: null                                                                      
    uidmap: null                                                                      
  kernel: 4.18.0-553.el8_10.x86_64                                                    
  linkmode: dynamic                                                                   
  logDriver: k8s-file                                                                 
  memFree: 1397669888                                                                 
  memTotal: 8071610368                                                                
  networkBackend: cni                                                                 
  networkBackendInfo:                                                                                                                                                    
    backend: cni                                                                                                                                                         
    dns:                                                                                                                                                                 
      package: podman-plugins-4.9.4-1.module+el8.10.0+21632+761e0d34.x86_64                                                                                              
      path: /usr/libexec/cni/dnsname                                                                                                                                     
      version: |-                                                                                                                                                        
        CNI dnsname plugin                                                                                                                                               
        version: 1.4.0-dev                                                                                                                                               
        commit: unknown                                                                                                                                                  
        CNI protocol versions supported: 0.1.0, 0.2.0, 0.3.0, 0.3.1, 0.4.0, 1.0.0                                                                                        
    package: containernetworking-plugins-1.4.0-2.module+el8.10.0+21366+f9cb49f8.x86_64                                                                                   
    path: /usr/libexec/cni                                                                                                                                               
  ociRuntime:                                                                                                                                                            
    name: runc                                                                                                                                                           
    package: runc-1.1.12-1.module+el8.10.0+21251+62b7388c.x86_64                                                                                                         
    path: /usr/bin/runc                                                                                                                                                  
    version: |-                                                                                                                                                          
      runc version 1.1.12                                                                                                                                                
      spec: 1.0.2-dev                                                                                                                                                    
      go: go1.21.3                                                                                                                                                       
      libseccomp: 2.5.2                                                                                                                                                  
  os: linux                                                                                                                                                              
  pasta:                                                                                                                                                                 
    executable: ""                                                                                                                                                       
    package: ""                                                                                                                                                          
    version: ""                                                                                                                                                          
  remoteSocket:                                                                                                                                                          
    exists: true                                                                                                                                                         
    path: /run/podman/podman.sock                                                                                                                                        
  security:                                                                                                                                                              
    apparmorEnabled: false                                                                                                                                               
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false                                                                                                                                                      
    seccompEnabled: true                                                                                                                                                 
    seccompProfilePath: /usr/share/containers/seccomp.json                                                                                                               
    selinuxEnabled: false                                                                                                                                                
  serviceIsRemote: false                                                                                                                                                 
  slirp4netns:                                                                                                                                                           
    executable: /usr/bin/slirp4netns                                                                                                                                     
    package: slirp4netns-1.2.3-1.module+el8.10.0+21306+6be40ce7.x86_64                                                                                                   
    version: |-                                                                                                                                                          
      slirp4netns version 1.2.3                                                                                                                                          
      commit: c22fde291bb35b354e6ca44d13be181c76a0a432                                                                                                                   
      libslirp: 4.4.0                                                                                                                                                    
      SLIRP_CONFIG_VERSION_MAX: 3                                                                                                                                        
      libseccomp: 2.5.2                                                                                                                                                  
  swapFree: 4294963200                                                                                                                                                   
  swapTotal: 4294963200                                                                                                                                                  
  uptime: 4h 7m 47.00s (Approximately 0.17 days)                                                                                                                         
  variant: ""                                                                                                                                                            
plugins:                                                                                                                                                                 
  log:                                                                 
  - k8s-file                                                           
  - none                                                               
  - passthrough                                                        
  - journald                                                           
  network:                                                             
  - bridge                                                             
  - macvlan                                                            
  - ipvlan                                                             
  volume:                                                              
  - local                                                              
registries:                                                            
  search:                                                              
  - registry.access.redhat.com                                         
  - registry.redhat.io                                                 
  - docker.io                                                          
store:                                                                 
  configFile: /etc/containers/storage.conf                             
  containerStore:                                                      
    number: 25                                                         
    paused: 0                                                          
    running: 25                                                        
    stopped: 0                                                         
  graphDriverName: overlay                                             
  graphOptions:                                                        
    overlay.mountopt: nodev,metacopy=on                                
  graphRoot: /var/lib/containers/storage                               
  graphRootAllocated: 161949396992                                     
  graphRootUsed: 20610424832                                           
  graphStatus:                                                         
    Backing Filesystem: xfs                                            
    Native Overlay Diff: "false"                                       
    Supports d_type: "true"                                            
    Supports shifting: "true"                                          
    Supports volatile: "true"                                          
    Using metacopy: "false"                                            
  imageCopyTmpDir: /var/tmp                                            
  imageStore:                                                          
    number: 25                                                         
  runRoot: /run/containers/storage                                     
  transientStore: false                                                
  volumePath: /var/lib/containers/storage/volumes                      
version:                                                               
  APIVersion: 4.9.4-rhel                                               
  Built: 1711986940                                                    
  BuiltTime: Mon Apr  1 15:55:40 2024                                  
  GitCommit: ""                                                        
  GoVersion: go1.21.7 (Red Hat 1.21.7-1.module+el8.10.0+21318+5ea197f8)
  Os: linux                                                            
  OsArch: linux/amd64                                                  
  Version: 4.9.4-rhel

Podman in a container

No

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

This happened in a vCenter virtualization environment where VMs boot with a time that was slightly too far in the future, and the time is corrected by ntpd during pod creation. Doesn't happen every time, just some. When this happens a critical service running in a podman pod does not start correctly because critical functions performed by init containers are performed out of order.

Additional information

No response

giuseppe commented 2 weeks ago

@baude, should we store each init container's position in the spec definition and use that to sort the init containers? Otherwise, I don't see any immediate way to do it with what we currently have.

Luap99 commented 2 weeks ago

Yes I think the only way is to store the order explicitly, maybe we can hack it into the annotations or we add a new field to the container config.

Although I think we use the time for much more things, for off display in podman ps/inspect. But also to calculate avg cpu based on it for things like stats so there are quite a few things that can be wrong if the time is changed.