NVIDIA / cloud-native-stack

Run cloud native workloads on NVIDIA GPUs
Apache License 2.0
118 stars 47 forks source link

pod CrashLoopBackOff #60

Open qinghaihan opened 2 months ago

qinghaihan commented 2 months ago

cloud-native-stack/install-guides /Jetson_Xavier_v11.1.md I referred to this article, and after deploying k8s, the pod "iva-video-analytics-demo-l4t" went into "CrashLoopBackOff" state, and when I tested opening the web page, it showed "Stream will start playing automatically when it is live".But my pod will be running in a while, but still, there is no video when accessing the webpage, and VLC doesn't work either. My deepstream was installed according to Nvidia's instructions, but when I tried to view the log files, the following error occurred: root@master:/home/nvidia# kubectl logs iva-video-analytics-demo-l4t-84c9df6766-q4t55

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

debconf: delaying package configuration, since apt-utils is not installed cp: cannot stat 'deepstream_app_tao_configs/*': No such file or directory

No NGC Configuration Provided

sed: can't read /opt/nvidia/deepstream/deepstream-6.2/samples/configs/tao_pretrained_models//config_infer_primary_trafficcamnet.txt: No such file or directory debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype dpkg-preconfigure: unable to re-open stdin: debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype dpkg-preconfigure: unable to re-open stdin: deepstream-app: error while loading shared libraries: libnvdla_compiler.so: cannot open shared object file: No such file or directory

qinghaihan commented 2 months ago
However, the container is quickly running but still unable to access the website. After describing, it shows that Containers: video-analytics-demo-l4t-1: Container ID: containerd://e65f6d4d21da2ac86fa79829ec769d8dc80161c012b1602a8478460f1a9f94a2 Image: nvcr.io/nvidia/deepstream-l4t:6.2-samples Image ID: nvcr.io/nvidia/deepstream-l4t@sha256:8583e790701140eadecb7c18c5cdf9eb1a60757a0948d27520e673ac95f2b02a Port: 8554/TCP Host Port: 0/TCP Command: sh -c apt update 2>&1 >/dev/null; apt install apt-utils wget unzip git-svn -y 2>&1 >/dev/null; git svn clone https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps/trunk/deepstream_app_tao_configs >/dev/null 2>&1; cp deepstream_app_tao_configs/* /opt/nvidia/deepstream/deepstream-6.2/samples/configs/tao_pretrained_models/; deepstream_version=$(echo '/opt/nvidia/deepstream/deepstream-6.2/samples/configs/tao_pretrained_models/' awk -F'/' '{print $5}' awk -F'-' '{print $2}'); rm -rf deepstream_app_tao_configs/; echo " "; echo " "; echo "No NGC Configuration Provided"; echo " "; sed -ie "s/..\/..\/models\/tao_pretrained_models\/trafficcamnet\/resnet18_trafficcamnet_pruned.etlt/\/opt\/nvidia\/deepstream\/deepstream-$deepstream_version\/samples\/configs\/tao_pretrained_models\/resnet18_trafficcamnet_pruned.etlt/g" /opt/nvidia/deepstream/deepstream-6.2/samples/configs/tao_pretrained_models//config_infer_primary_trafficcamnet.txt; bash /opt/nvidia/deepstream/deepstream-$deepstream_version/user_additional_install.sh 2>&1 >/dev/null; python /opt/nvidia/deepstream/create_config.py deepstream-app /opt/nvidia/deepstream/deepstream-6.2/samples/configs/deepstream-app/source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt; cat /opt/nvidia/deepstream/deepstream-$deepstream_version/samples/configs/deepstream-app/run.txt State: Running Started: Tue, 09 Apr 2024 17:23:59 +0800 Last State: Terminated Reason: Completed Exit Code: 0 Started: Tue, 09 Apr 2024 17:17:51 +0800 Finished: Tue, 09 Apr 2024 17:18:54 +0800 Ready: True Restart Count: 8 Environment: Mounts: /etc/config from ipmount (rw) /opt/nvidia/deepstream/create_config.py from create-config (rw,path="create_config.py") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xff5v (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: ipmount: Type: ConfigMap (a volume populated by a ConfigMap) Name: iva-video-analytics-demo-l4t-configmap Optional: false create-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: iva-video-analytics-demo-l4t-create-config Optional: false kube-api-access-xff5v: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message

Normal Scheduled 35m default-scheduler Successfully assigned default/iva-video-analytics-demo-l4t-896fbc5c8-ttn6b to master Normal Created 32m (x3 over 35m) kubelet Created container video-analytics-demo-l4t-1 Normal Started 32m (x3 over 35m) kubelet Started container video-analytics-demo-l4t-1 Warning BackOff 31m (x3 over 33m) kubelet Back-off restarting failed container video-analytics-demo-l4t-1 in pod iva-video-analytics-demo-l4t-896fbc5c8-ttn6b_default(b7dfbbbb-a5ba-4f06-81f4-8dc7a133803f) Normal Pulled 31m (x4 over 35m) kubelet Container image "nvcr.io/nvidia/deepstream-l4t:6.2-samples" already present on machine Warning FailedToRetrieveImagePullSecret 5m8s (x97 over 35m) kubelet Unable to retrieve some image pull secrets (nvidia-registrykey-secret); attempting to pull the image may not succeed.