apache / rocketmq-operator

Apache RocketMQ Operator
https://rocketmq.apache.org/
Apache License 2.0
310 stars 126 forks source link

unable to start container process: exec: "/manager": stat /manager: no such file or directory: unknown #110

Closed suuugeee closed 2 years ago

suuugeee commented 2 years ago

hi,bro。

After I copy the code of master to build and install, the status of "rocketmq-operator" pod is "RunContainerError", and the error prompts "exec: "/manager": stat /manager: no such file or directory: unknown".

I don't know what is causing this, but this is running on centos 8 steam.

suuugeee commented 2 years ago

微信截图_20220706173648 微信截图_20220706173634

gobbq commented 2 years ago

@caigy The official operator image may need to be updated.

gobbq commented 2 years ago

@z2289181978 As a temporary solution, you can execute make docker-build to generate a local image, instead of using the official image

suuugeee commented 2 years ago

@z2289181978作为临时解决方案,您可以执行make docker-build生成本地图像,而不是使用官方图像

thanks bro, I'll try what you said。

suuugeee commented 2 years ago

@gobbq bro,After I try "make docker-build", it still prompts "Error: failed to start container "manager": Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec : "/manager": stat /manager: no such file or directory: unknown”

suuugeee commented 2 years ago

@gobbq image "apacherocketmq/rocketmq-operator:0.3.0-snapshot" is the image I generated by "make docker-build", but it doesn't work as expected.

suuugeee commented 2 years ago

@caigy

caigy commented 2 years ago

@suuugeee I've tried make docker-build IMG=apacherocketmq/rocketmq-operator:0.3.0-snapshot and then deployed it, but couldn't reproduce the issue. Could you provide more detailed information about your operation?

  1. I found that /manager was not found in your container, please enter your container to check whether it exists. Also, confirm that following messages were printed when the operator image was built:
    Step 19/21 : COPY --from=builder /workspace/manager .
    ---> 854f7b6b7af5
    Step 20/21 : USER 65532:65532
    ---> Running in eddefca4e329
    Removing intermediate container eddefca4e329
    ---> 57456f64ede1
    Step 21/21 : ENTRYPOINT ["/manager"]
    ---> Running in f75b5644e0a8
    Removing intermediate container f75b5644e0a8
    ---> 1e53679155f9
    Successfully built 1e53679155f9
    Successfully tagged apacherocketmq/rocketmq-operator:0.3.0-snapshot
  2. Please provide the result of docker info. The environment I deployed:

============================================= I ran the following commands then got successful result:

image

suuugeee commented 2 years ago

@caigy image

image

码头工人:20.10.17 操作系统:Alibaba Cloud Linux 3,核心版本 5.10.84-10.4.al8.x86_64 CPU架构:x86_64

caigy commented 2 years ago

@suuugeee Could you enter the operator container to check whether /manager exists?

ccctask commented 2 years ago

same issue with me

Normal   Scheduled       14s   default-scheduler  Successfully assigned default/rocketmq-operator-6f65c77c49-d488s to k8s-ycloud-worker192.168.101.182-dev
  Normal   AllocIPSucceed  14s   terway-daemon      Alloc IP 192.168.30.170/24
  Normal   Pulling         14s   kubelet            Pulling image "apacherocketmq/rocketmq-operator:0.3.0-snapshot"
  Normal   Pulled          1s    kubelet            Successfully pulled image "apacherocketmq/rocketmq-operator:0.3.0-snapshot" in 12.816837469s
  Normal   Created         1s    kubelet            Created container manager
  Warning  Failed          1s    kubelet            Error: failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "/manager": stat /manager: no such file or directory: unknown
bash-4.4$ ls
bin    dev    etc    home   lib    media  mnt    opt    proc   root   run    sbin   srv    sys    tmp    usr    var

============== in Alibaba Cloud Linux

caigy commented 2 years ago

@ccctask Thanks for your report, pls also post your docker info output. It seems that the issue has something to do with OS or docker version. BTW, is your operator image built on the same environment, or built on machine with different OS or docker version and then transferred to the environment where it deployed?

ccctask commented 2 years ago
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.32-1.al8.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.32, commit: facef751f675b2441a0cf72606fe08a9110f8838'
  cpus: 2
  distribution:
    distribution: '"alinux"'
    version: "3"
  eventLogger: file
  hostname: iZt4ndjnm00f7wphn4wd6gZ
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.112-11.al8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1680097280
  memTotal: 3906551808
  ociRuntime:
    name: runc
    package: runc-1.0.3-1.al8.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      spec: 1.0.2-dev
      go: go1.16.12
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.1.al8.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 0
  swapTotal: 0
  uptime: 1h 2m 46.96s (Approximately 0.04 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 2
    paused: 0
    running: 0
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 21
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.2
  Built: 1647932805
  BuiltTime: Tue Mar 22 15:06:45 2022
  GitCommit: ""
  GoVersion: go1.16.12
  OsArch: linux/amd64
  Version: 3.4.2

@caigy built on the same OS with docker runtime, setup with contianerd runtime. I have tried using docker as runtime but it didn't solve the this problem

caigy commented 2 years ago

@ccctask Thanks for your reply. I'll try to find an environment with Alibaba Linux and reproduce it. At the same time, you can try removing L44 and L49 in dockerfile and build it again. I guess the problem may be caused by that user.

https://github.com/apache/rocketmq-operator/blob/f904c6604a58880d83444b25c4a331092737770a/Dockerfile#L44-L49

suuugeee commented 2 years ago

@ccctask Thanks for your reply. I'll try to find an environment with Alibaba Linux and reproduce it. At the same time, you can try removing L44 and L49 in dockerfile and build it again. I guess the problem may be caused by that user.

https://github.com/apache/rocketmq-operator/blob/f904c6604a58880d83444b25c4a331092737770a/Dockerfile#L44-L49

Deleting it doesn't seem to work either.

caigy commented 2 years ago

@ccctask @suuugeee Please try replacing USER 65532:65532 with USER root:root, rebuild image and then check if /manager can be found. IMO /manager should exists in operator container (else the docker building would fail), it's probably that this file just can't be shown for privilege problems.

In my environment, /manager belongs to user root:

# docker run -it --entrypoint /bin/sh apacherocketmq/rocketmq-operator:0.3.0-snapshot
/ $ ls -al
total 49456
drwxr-xr-x    1 root     root          4096 Jul 14 01:44 .
drwxr-xr-x    1 root     root          4096 Jul 14 01:44 ..
-rwxr-xr-x    1 root     root             0 Jul 14 01:44 .dockerenv
drwxr-xr-x    1 root     root          4096 Jul 12 05:01 bin
drwxr-xr-x    5 root     root           360 Jul 14 01:44 dev
drwxr-xr-x    1 root     root          4096 Jul 14 01:44 etc
drwxr-xr-x    1 root     root          4096 Jul 12 05:02 home
drwxr-xr-x    1 root     root          4096 May 11  2019 lib
-rwxr-xr-x    1 root     root      50576568 Jul 12 05:00 manager
drwxr-xr-x    5 root     root          4096 May  9  2019 media
drwxr-xr-x    2 root     root          4096 May  9  2019 mnt
drwxr-xr-x    2 root     root          4096 May  9  2019 opt
dr-xr-xr-x  193 root     root             0 Jul 14 01:44 proc
drwx------    1 root     root          4096 Jul 12 05:40 root
drwxr-xr-x    2 root     root          4096 May  9  2019 run
drwxr-xr-x    2 root     root          4096 May  9  2019 sbin
drwxr-xr-x    2 root     root          4096 May  9  2019 srv
dr-xr-xr-x   13 root     root             0 Jul 14 01:44 sys
drwxrwxrwt    2 root     root          4096 May  9  2019 tmp
drwxr-xr-x    1 root     root          4096 May 11  2019 usr
drwxr-xr-x    1 root     root          4096 May  9  2019 var
caigy commented 2 years ago

No user with id 65532 in operator container, this may be the cause. @gobbq

/ $ cat /etc/passwd
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
news:x:9:13:news:/usr/lib/news:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin
operator:x:11:0:operator:/root:/bin/sh
man:x:13:15:man:/usr/man:/sbin/nologin
postmaster:x:14:12:postmaster:/var/spool/mail:/sbin/nologin
cron:x:16:16:cron:/var/spool/cron:/sbin/nologin
ftp:x:21:21::/var/lib/ftp:/sbin/nologin
sshd:x:22:22:sshd:/dev/null:/sbin/nologin
at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin
squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin
xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin
games:x:35:35:games:/usr/games:/sbin/nologin
postgres:x:70:70::/var/lib/postgresql:/bin/sh
cyrus:x:85:12::/usr/cyrus:/sbin/nologin
vpopmail:x:89:89::/var/vpopmail:/sbin/nologin
ntp:x:123:123:NTP:/var/empty:/sbin/nologin
smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin
guest:x:405:100:guest:/dev/null:/sbin/nologin
nobody:x:65534:65534:nobody:/:/sbin/nologin
suuugeee commented 2 years ago
  • kubectl create -f deploy/operator.yaml

I tested your program, and can't press another effect, "manager" disappears.

caigy commented 2 years ago

CASE 1: Docker: 19.03.15, OS: Alibaba Cloud Linux 3 (Soaring Falcon), kernal: 5.10.84-10.2.al8.x86_64

On Alibaba Cloud ACK, the operator image built on the same environment is correct:

# docker run -it --entrypoint /bin/sh apacherocketmq/rocketmq-operator:0.3.0-snapshot
/ $ ls -al
total 49456
drwxr-xr-x    1 root     root          4096 Jul 15 09:56 .
drwxr-xr-x    1 root     root          4096 Jul 15 09:56 ..
-rwxr-xr-x    1 root     root             0 Jul 15 09:56 .dockerenv
drwxr-xr-x    1 root     root          4096 Jul 15 09:29 bin
drwxr-xr-x    5 root     root           360 Jul 15 09:56 dev
drwxr-xr-x    1 root     root          4096 Jul 15 09:56 etc
drwxr-xr-x    1 root     root          4096 Jul 15 09:31 home
drwxr-xr-x    1 root     root          4096 May 11  2019 lib
-rwxr-xr-x    1 root     root      50576560 Jul 15 09:26 manager
drwxr-xr-x    5 root     root          4096 May  9  2019 media
drwxr-xr-x    2 root     root          4096 May  9  2019 mnt
drwxr-xr-x    2 root     root          4096 May  9  2019 opt
dr-xr-xr-x  341 root     root             0 Jul 15 09:56 proc
drwx------    1 root     root          4096 Jul 15 09:53 root
drwxr-xr-x    2 root     root          4096 May  9  2019 run
drwxr-xr-x    2 root     root          4096 May  9  2019 sbin
drwxr-xr-x    2 root     root          4096 May  9  2019 srv
dr-xr-xr-x   13 root     root             0 Jul 15 09:56 sys
drwxrwxrwt    2 root     root          4096 May  9  2019 tmp
drwxr-xr-x    1 root     root          4096 May 11  2019 usr
drwxr-xr-x    1 root     root          4096 May  9  2019 var

Outputs of docker info:

# docker info
Client:
 Debug Mode: false

Server:
 Containers: 44
  Running: 41
  Paused: 0
  Stopped: 3
 Images: 42
 Server Version: 19.03.15
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 init version: fec3683
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.84-10.2.al8.x86_64
 Operating System: Alibaba Cloud Linux 3 (Soaring Falcon)
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 14.86GiB
 Name: iZwz91yvcvq6jqu4j4qq3lZ
 ID: KMVK:J4QP:YEFV:ZXHD:PYAJ:DZX5:T2YE:BWKX:TLBY:4YPN:A442:OWNX
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://pqbap4ya.mirror.aliyuncs.com/
 Live Restore Enabled: true
caigy commented 2 years ago

CASE 2: Docker: 20.10.17 OS: Alibaba Cloud Linux 3 (Soaring Falcon), Kernel: 5.10.112-11.al8.x86_64

[root@iZwz9c0gh4hpp8cwwuovsvZ rocketmq-operator-master]# docker run -it --entrypoint /bin/sh apacherocketmq/rocketmq-operator:0.3.0-snapshot
/ $ ls -al
total 49472
drwxr-xr-x    1 root     root          4096 Jul 16 09:29 .
drwxr-xr-x    1 root     root          4096 Jul 16 09:29 ..
-rwxr-xr-x    1 root     root             0 Jul 16 09:29 .dockerenv
drwxr-xr-x    1 root     root          4096 Jul 16 09:20 bin
drwxr-xr-x    5 root     root           360 Jul 16 09:29 dev
drwxr-xr-x    1 root     root          4096 Jul 16 09:29 etc
drwxr-xr-x    1 root     root          4096 Jul 16 09:21 home
drwxr-xr-x    1 root     root          4096 May 11  2019 lib
-rwxr-xr-x    1 root     root      50573577 Jul 16 09:18 manager
drwxr-xr-x    5 root     root          4096 May  9  2019 media
drwxr-xr-x    2 root     root          4096 May  9  2019 mnt
drwxr-xr-x    2 root     root          4096 May  9  2019 opt
dr-xr-xr-x  217 root     root             0 Jul 16 09:29 proc
drwx------    1 root     root          4096 Jul 16 09:25 root
drwxr-xr-x    2 root     root          4096 May  9  2019 run
drwxr-xr-x    2 root     root          4096 May  9  2019 sbin
drwxr-xr-x    2 root     root          4096 May  9  2019 srv
dr-xr-xr-x   13 root     root             0 Jul 16 09:29 sys
drwxrwxrwt    2 root     root          4096 May  9  2019 tmp
drwxr-xr-x    1 root     root          4096 May 11  2019 usr
drwxr-xr-x    1 root     root          4096 May  9  2019 var

Outputs of docker info:

[root@iZwz9c0gh4hpp8cwwuovsvZ rocketmq-operator-master]# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.8.2-docker)
  scan: Docker Scan (Docker Inc., v0.17.0)

Server:
 Containers: 1
  Running: 0
  Paused: 0
  Stopped: 1
 Images: 21
 Server Version: 20.10.17
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: false
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
 runc version: v1.1.2-0-ga916309
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.10.112-11.al8.x86_64
 Operating System: Alibaba Cloud Linux 3 (Soaring Falcon)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 7.307GiB
 Name: iZwz9c0gh4hpp8cwwuovsvZ
 ID: 4WQY:WHTM:7DUQ:TAUH:64JT:UNTM:C62U:MJZB:QLZO:7LWL:NJLR:Z73O
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
caigy commented 2 years ago

CASE 3: Emulate Docker CLI using podman 3.4.2(runc version 1.0.3) OS: Alibaba Cloud Linux 3 (Soaring Falcon), Kernel: 5.10.112-11.al8.x86_64

# docker run -it --entrypoint /bin/sh apacherocketmq/rocketmq-operator:0.3.0-snapshot
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
~ $ ls -al
total 49468
dr-xr-xr-x    1 root     root          4096 Jul 16 10:38 .
dr-xr-xr-x    1 root     root          4096 Jul 16 10:38 ..
drwxr-xr-x    1 root     root          4096 Jul 16 10:28 bin
drwxr-xr-x    5 root     root           360 Jul 16 10:38 dev
drwxr-xr-x    1 root     root          4096 Jul 16 10:37 etc
drwxr-xr-x    1 root     root          4096 Jul 16 10:31 home
drwxr-xr-x    1 root     root          4096 May 11  2019 lib
-rwxr-xr-x    1 root     root      50573577 Jul 16 10:24 manager
drwxr-xr-x    5 root     root          4096 May  9  2019 media
drwxr-xr-x    2 root     root          4096 May  9  2019 mnt
drwxr-xr-x    2 root     root          4096 May  9  2019 opt
dr-xr-xr-x  201 root     root             0 Jul 16 10:38 proc
drwx------    1 root     root          4096 Jul 16 10:37 root
drwxr-xr-x    1 root     root          4096 Jul 16 10:25 run
drwxr-xr-x    2 root     root          4096 May  9  2019 sbin
drwxr-xr-x    2 root     root          4096 May  9  2019 srv
dr-xr-xr-x   13 root     root             0 Jul 16 10:38 sys
drwxrwxrwt    2 root     root          4096 May  9  2019 tmp
drwxr-xr-x    1 root     root          4096 May 11  2019 usr
drwxr-xr-x    1 root     root          4096 May  9  2019 var

Outputs of docker info:

# docker info
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.32-1.al8.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.32, commit: facef751f675b2441a0cf72606fe08a9110f8838'
  cpus: 4
  distribution:
    distribution: '"alinux"'
    version: "3"
  eventLogger: file
  hostname: iZwz9c0gh4hpp8cwwuovsvZ
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.10.112-11.al8.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 2189160448
  memTotal: 7845326848
  ociRuntime:
    name: runc
    package: runc-1.0.3-1.al8.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.3
      spec: 1.0.2-dev
      go: go1.16.12
      libseccomp: 2.5.1
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.8-1.1.al8.x86_64
    version: |-
      slirp4netns version 1.1.8
      commit: d361001f495417b880f20329121e3aa431a8f90f
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 0
  swapTotal: 0
  uptime: 23h 27m 47.25s (Approximately 0.96 days)
plugins:
  log:
  - k8s-file
  - none
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 21
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 3.4.2
  Built: 1647932805
  BuiltTime: Tue Mar 22 15:06:45 2022
  GitCommit: ""
  GoVersion: go1.16.12
  OsArch: linux/amd64
  Version: 3.4.2
suuugeee commented 2 years ago

@caigy After I run the container, I found "manager", but kubectl get pod still prompts an error. image

suuugeee commented 2 years ago

Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Docker Buildx (Docker Inc., v0.8.2-docker) scan: Docker Scan (Docker Inc., v0.17.0)

Server: Containers: 37 Running: 32 Paused: 0 Stopped: 5 Images: 30 Server Version: 20.10.17 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: false userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1 runc version: v1.1.2-0-ga916309 init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 5.10.112-11.al8.x86_64 Operating System: Alibaba Cloud Linux 3 (Soaring Falcon) OSType: linux Architecture: x86_64 CPUs: 4 Total Memory: 7.494GiB Name: dddmaster ID: LQHY:R4AW:LE6F:YETD:VIM6:PWYC:GFBU:MFBR:3DX4:E4T4:5YNY:6K7V Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Registry Mirrors: https://kn0t2bca.mirror.aliyuncs.com/ Live Restore Enabled: false

suuugeee commented 2 years ago

image

suuugeee commented 2 years ago

Is there a problem with kubernetes, my kubernetes version is v1.23.8

caigy commented 2 years ago

@suuugeee Did you build rocketmq operator image by docker build and run it on containerd runtime?

suuugeee commented 2 years ago

@caigy I made pictures according to "README.md", is it your steps

suuugeee commented 2 years ago

/www/software2/k8s/rocketmq-operator-master/bin/controller-gen rbac:roleName=manager-role crd webhook paths="./..." output:dir=deploy output:crd:artifacts:config=deploy/crds /www/software2/k8s/rocketmq-operator-master/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..." go fmt ./... pkg/apis/rocketmq/v1alpha1/zz_generated.deepcopy.go go vet ./... KUBEBUILDER_ASSETS="/root/.local/share/kubebuilder-envtest/k8s/1.22.1-linux-amd64" go test ./... -coverprofile cover.out ? github.com/apache/rocketmq-operator [no test files] ? github.com/apache/rocketmq-operator/pkg/apis/rocketmq [no test files] ? github.com/apache/rocketmq-operator/pkg/apis/rocketmq/v1alpha1 [no test files] ? github.com/apache/rocketmq-operator/pkg/constants [no test files] ? github.com/apache/rocketmq-operator/pkg/controller/broker [no test files] ? github.com/apache/rocketmq-operator/pkg/controller/console [no test files] ? github.com/apache/rocketmq-operator/pkg/controller/nameservice [no test files] ? github.com/apache/rocketmq-operator/pkg/controller/topictransfer [no test files] ? github.com/apache/rocketmq-operator/pkg/share [no test files] ? github.com/apache/rocketmq-operator/pkg/tool [no test files] ? github.com/apache/rocketmq-operator/version [no test files] docker build -t apacherocketmq/rocketmq-operator:0.3.0-snapshot . Sending build context to Docker daemon 83.9MB Step 1/21 : FROM golang:1.16 as builder ---> 8ffb179c0658 Step 2/21 : WORKDIR /workspace ---> Using cache ---> 8fb722f32cff Step 3/21 : COPY go.mod go.mod ---> Using cache ---> 1b0e9038dcb4 Step 4/21 : COPY go.sum go.sum ---> Using cache ---> ad1e3e42fe7a Step 5/21 : RUN go env -w GO111MODULE=on ---> Using cache ---> f76376549aed Step 6/21 : RUN go env -w GOPROXY=https://mirrors.aliyun.com/goproxy,direct ---> Using cache ---> 65c94bbc8d77 Step 7/21 : RUN go mod download ---> Using cache ---> e8c2c5f2efc9 Step 8/21 : COPY main.go main.go ---> Using cache ---> 415521600224 Step 9/21 : COPY pkg/ pkg/ ---> 36d6cef217bf Step 10/21 : RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o manager main.go ---> Running in 40a7912846cd Removing intermediate container 40a7912846cd ---> e5006d63a6e4 Step 11/21 : FROM openjdk:8-alpine ---> a3562aa0b991 Step 12/21 : RUN apk add --no-cache bash gettext nmap-ncat openssl busybox-extras ---> Using cache ---> e91627134db4 Step 13/21 : ENV ROCKETMQ_HOME /home/rocketmq ---> Using cache ---> 7888599ef1b7 Step 14/21 : ENV ROCKETMQ_VERSION 4.5.0 ---> Using cache ---> b6e8b072f9f2 Step 15/21 : WORKDIR ${ROCKETMQ_HOME} ---> Using cache ---> 6ad867a456ac Step 16/21 : RUN set -eux; apk add --virtual .build-deps curl gnupg unzip; curl https://archive.apache.org/dist/rocketmq/${ROCKETMQ_VERSION}/rocketmq-all-${ROCKETMQ_VERSION}-bin-release.zip -o rocketmq.zip; curl https://archive.apache.org/dist/rocketmq/${ROCKETMQ_VERSION}/rocketmq-all-${ROCKETMQ_VERSION}-bin-release.zip.asc -o rocketmq.zip.asc; curl -L https://www.apache.org/dist/rocketmq/KEYS -o KEYS; gpg --import KEYS; gpg --batch --verify rocketmq.zip.asc rocketmq.zip; unzip rocketmq.zip; mv rocketmq-/ . ; chmod a+x ; rmdir rocketmq- ; rm rocketmq.zip; apk del .build-deps ; rm -rf /var/cache/apk/ ; rm -rf /tmp/ ---> Using cache ---> a92bfa1e428a Step 17/21 : RUN chown -R root:0 ${ROCKETMQ_HOME} ---> Using cache ---> cc72a450765d Step 18/21 : WORKDIR / ---> Using cache ---> bfd901eec7a2 Step 19/21 : COPY --from=builder /workspace/manager . ---> 8a53ac46d82c Step 20/21 : USER root:root ---> Running in cc4dee7a8c26 Removing intermediate container cc4dee7a8c26 ---> 7f82babfcc08 Step 21/21 : ENTRYPOINT ["/manager"] ---> Running in 53c0ece4f48f Removing intermediate container 53c0ece4f48f ---> ff0b61f2d71b Successfully built ff0b61f2d71b Successfully tagged apacherocketmq/rocketmq-operator:0.3.0-snapshot

caigy commented 2 years ago

@suuugeee Please check sha of operator image you are using. There is an image on dockerhub, which was built 2 years ago: image

So you can give another tag to your own image, make sure the image you've built is on the node where rocketmq operator is running.

caigy commented 2 years ago

CASE 4: Containerd 1.5.10 OS: Alibaba Cloud Linux 3 (Soaring Falcon), Kernel: 5.10.84-10.2.al8.x86_64

Build operator image by docker build first, use docker save to export image file, then import this file by ctr image import command.

[root@iZwz91yvcvq6jsszq3ech9Z deploy]# kubectl describe po rocketmq-operator-645796d4bc-2pn85
Name:         rocketmq-operator-645796d4bc-2pn85
Namespace:    default
Priority:     0
Node:         cn-shenzhen.172.16.0.57/172.16.0.57
Start Time:   Tue, 19 Jul 2022 20:46:12 +0800
Labels:       name=rocketmq-operator
              pod-template-hash=645796d4bc
Annotations:  kubernetes.io/psp: ack.privileged
Status:       Running
IP:           172.16.0.84
IPs:
  IP:           172.16.0.84
Controlled By:  ReplicaSet/rocketmq-operator-645796d4bc
Containers:
  manager:
    Container ID:  containerd://37894a4572e365b4698167f2a13741f2057361a5209e8506885bc879f935e826
    Image:         localhost/apacherocketmq/rocketmq-operator:0.3.0-snapshot
    Image ID:      sha256:24c933e6d0d914c9cfe128e5afdf0a42bbae8f0201e10c6afc515909bccfa491
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --leader-elect
    State:          Running
      Started:      Tue, 19 Jul 2022 20:46:15 +0800
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:      http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:
      WATCH_NAMESPACE:  default (v1:metadata.namespace)
      POD_NAME:         rocketmq-operator-645796d4bc-2pn85 (v1:metadata.name)
      OPERATOR_NAME:    rocketmq-operator
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-csfpm (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kube-api-access-csfpm:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       10m   default-scheduler  Successfully assigned default/rocketmq-operator-645796d4bc-2pn85 to cn-shenzhen.172.16.0.57
  Normal  AllocIPSucceed  10m   terway-daemon      Alloc IP 172.16.0.84/24
  Normal  Pulled          10m   kubelet            Container image "localhost/apacherocketmq/rocketmq-operator:0.3.0-snapshot" already present on machine
  Normal  Created         10m   kubelet            Created container manager
  Normal  Started         10m   kubelet            Started container manager
suuugeee commented 2 years ago

I don't know why, "ctr image import" cannot import images. This is a very troublesome problem, but I am busy with other things recently, so this problem can only be put on hold for the time being.