cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.
https://cvat.ai
MIT License
12.41k stars 2.98k forks source link

Semi-automatic Annotation - Documentation outdated, Nvidia, NO_PUBKEY A4B469963BF863CC #4707

Open JRGit4UE opened 2 years ago

JRGit4UE commented 2 years ago

My actions before raising this issue

Trying to enable semi-automatic annotation from the latest stable version as documented at https://openvinotoolkit.github.io/cvat/docs/administration/advanced/installation_automatic_annotation/ for GPU SUPPORT fails, as Nvidia has changed a key.

Expected Behaviour

Following the documentation should result in successful installation of
serverless/tensorflow/matterport/mask_rcnn/nuclio

Update documentation to either:

Current Behaviour

Calling nuctl deploy --project-name cvat \ --path serverless/tensorflow/matterport/mask_rcnn/nuclio \ --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \ --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \ --image cvat/tf.matterport.mask_rcnn_gpu \ --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \ --resource-limit nvidia.com/gpu=1 ends with Reading package lists... W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.

Possible Solution

According to https://forums.developer.nvidia.com/t/gpg-error-http-developer-download-nvidia-com-compute-cuda-repos-ubuntu1804-x86-64/212904/3 the steps to resolve the problem on Debian based systems is to remove the outdated key and install the current one

sudo apt-key del 7fa2af80 sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub

Your Environment

`git log -1 commit d7560bbd39fec68f944515c2591dda74f3764b90 (HEAD -> develop, origin/develop, origin/HEAD) Merge: ba4175bf b7dba6aa Author: Nico Galoppo nico.galoppo@intel.com Date: Tue May 17 11:25:58 2022 -0500

Merge pull request #4639 from openvinotoolkit/ncgalopp/fix-build

`

Next steps

You may join our Gitter channel for community support.

karolbadowski commented 2 years ago

Hello. I am struggling with this problem and it is very urgent, but I do not know how to resolve it. Maybe I am handling the dockers in a wrong way or modifying a wrong file.

When I try to build .../cvat/serverless/tensorflow/matterport/mask_rcnn_fixed/nuclio/function-gpu.yaml ,

nuctl deploy --project-name cvat --path serverless/tensorflow/matterport/mask_rcnn/nuclio --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." --image cvat/tf.matterport.mask_rcnn_gpu --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' --resource-limit nvidia.com/gpu=1

the log indicates that this problem happens during the execution of line:

RUN apt update && apt install --no-install-recommends -y git curl

Here is the log:

22.08.02 17:47:07.249 nuctl (I) Deploying function {"name": ""}

22.08.02 17:47:07.249 nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.9.1, Git commit: 5fb902dd1fafabed267f79b3267e19804ee93bda, OS: linux, Arch: amd64, Go version: go1.17.10", "name": ""} 22.08.02 17:47:07.436 nuctl (I) Staging files and preparing base images 22.08.02 17:47:07.436 nuctl (W) Python 3.6 runtime is deprecated and will soon not be supported. Please migrate your code and use Python 3.7 runtime (python:3.7) or higher 22.08.02 17:47:07.436 nuctl (I) Building processor image {"registryURL": "", "taggedImageName": "cvat/tf.matterport.mask_rcnn_gpu:latest"} 22.08.02 17:47:07.436 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.9.1-amd64"} 22.08.02 17:47:10.356 nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"} 22.08.02 17:47:14.246 nuctl.platform (I) Building docker image {"image": "cvat/tf.matterport.mask_rcnn_gpu:latest"} 22.08.02 17:47:18.169 nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-568811864/staging", "cmd": "docker build --network host --force-rm -t cvat/tf.matterport.mask_rcnn_gpu:latest -f /tmp/nuclio-build-568811864/staging/Dockerfile.processor --build-arg NUCLIO_LABEL=1.9.1 --build-arg NUCLIO_ARCH=amd64 --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler .", "stderr": "The command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100\n"} 22.08.02 17:47:18.175 nuctl (W) Failed to create a function; setting the function status {"err": "Failed to build processor image", "errVerbose": "\nError - exit status 100\n /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon 51.16MB\r\r\nStep 1/17 : FROM tensorflow/tensorflow:1.15.5-gpu-py3\n ---> 73be11373498\nStep 2/17 : ARG NUCLIO_LABEL\n ---> Using cache\n ---> ce09667e4588\nStep 3/17 : ARG NUCLIO_ARCH\n ---> Using cache\n ---> ee4549ac7db8\nStep 4/17 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR\n ---> Using cache\n ---> 688565186b35\nStep 5/17 : COPY artifacts/processor /usr/local/bin/processor\n ---> Using cache\n ---> 48a3b91efbc1\nStep 6/17 : COPY artifacts/py /opt/nuclio/\n ---> Using cache\n ---> 39ba78f106bd\nStep 7/17 : COPY artifacts/py-whl /opt/nuclio/whl\n ---> Using cache\n ---> 221a56010c52\nStep 8/17 : COPY artifacts/uhttpc /usr/local/bin/uhttpc\n ---> Using cache\n ---> 09519af89f11\nStep 9/17 : COPY handler /opt/nuclio\n ---> Using cache\n ---> f849808a29d6\nStep 10/17 : HEALTHCHECK --interval=1s --timeout=3s CMD /usr/local/bin/uhttpc --url http://127.0.0.1:8082/ready || exit 1\n ---> Using cache\n ---> b0500d1c8d03\nStep 11/17 : RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl\n ---> Using cache\n ---> a965b5f4b9aa\nStep 12/17 : WORKDIR /opt/nuclio\n ---> Using cache\n ---> 24c47938ae64\nStep 13/17 : RUN apt update && apt install --no-install-recommends -y git curl\n ---> Running in 3dff9034dfcc\n\u001b[91m\nWARNING: apt does not have a stable CLI interface. Use with caution in scripts.\n\n\u001b[0mHit:1 http://archive.ubuntu.com/ubuntu bionic InRelease\nGet:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B]\nGet:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]\nGet:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]\nGet:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]\nIgn:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease\nGet:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B]\nGet:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B]\nErr:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease\n The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\nGet:9 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1107 kB]\nGet:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB]\nGet:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB]\nGet:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3336 kB]\nGet:13 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1527 kB]\nGet:14 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2306 kB]\nGet:15 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]\nGet:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]\nGet:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB]\nGet:18 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2905 kB]\nGet:19 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1065 kB]\nReading package lists...\n\u001b[91mW: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC\nE: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed.\n\u001b[0mRemoving intermediate container 3dff9034dfcc\n\nstderr:\nThe command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100\n\n /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n /nuclio/pkg/dockerclient/shell.go:117\nFailed to build docker image\n .../pkg/containerimagebuilderpusher/docker.go:54\nFailed to build processor image\n /nuclio/pkg/processor/build/builder.go:263\nFailed to build processor image"}

Error - exit status 100 /nuclio/pkg/cmdrunner/shellrunner.go:96

Call stack: stdout: Sending build context to Docker daemon 51.16MB Step 1/17 : FROM tensorflow/tensorflow:1.15.5-gpu-py3 ---> 73be11373498 Step 2/17 : ARG NUCLIO_LABEL ---> Using cache ---> ce09667e4588 Step 3/17 : ARG NUCLIO_ARCH ---> Using cache ---> ee4549ac7db8 Step 4/17 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR ---> Using cache ---> 688565186b35 Step 5/17 : COPY artifacts/processor /usr/local/bin/processor ---> Using cache ---> 48a3b91efbc1 Step 6/17 : COPY artifacts/py /opt/nuclio/ ---> Using cache ---> 39ba78f106bd Step 7/17 : COPY artifacts/py-whl /opt/nuclio/whl ---> Using cache ---> 221a56010c52 Step 8/17 : COPY artifacts/uhttpc /usr/local/bin/uhttpc ---> Using cache ---> 09519af89f11 Step 9/17 : COPY handler /opt/nuclio ---> Using cache ---> f849808a29d6 Step 10/17 : HEALTHCHECK --interval=1s --timeout=3s CMD /usr/local/bin/uhttpc --url http://127.0.0.1:8082/ready || exit 1 ---> Using cache ---> b0500d1c8d03 Step 11/17 : RUN pip install nuclio-sdk msgpack --no-index --find-links /opt/nuclio/whl ---> Using cache ---> a965b5f4b9aa Step 12/17 : WORKDIR /opt/nuclio ---> Using cache ---> 24c47938ae64 Step 13/17 : RUN apt update && apt install --no-install-recommends -y git curl ---> Running in 3dff9034dfcc

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Hit:1 http://archive.ubuntu.com/ubuntu bionic InRelease Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B] Get:3 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB] Get:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB] Get:5 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB] Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease Get:7 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B] Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B] Err:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC Get:9 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [1107 kB] Get:10 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB] Get:11 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB] Get:12 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3336 kB] Get:13 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1527 kB] Get:14 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2306 kB] Get:15 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB] Get:16 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB] Get:17 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [22.8 kB] Get:18 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2905 kB] Get:19 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [1065 kB] Reading package lists... W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is no longer signed. Removing intermediate container 3dff9034dfcc

stderr: The command '/bin/bash -c apt update && apt install --no-install-recommends -y git curl' returned a non-zero code: 100

/nuclio/pkg/cmdrunner/shellrunner.go:96

Failed to build /nuclio/pkg/dockerclient/shell.go:117 Failed to build docker image .../pkg/containerimagebuilderpusher/docker.go:54 Failed to build processor image /nuclio/pkg/processor/build/builder.go:263 Failed to deploy function ...//nuclio/pkg/platform/abstract/platform.go:198

I have tried to modify the file .../cvat/serverless/tensorflow/matterport/mask_rcnn_fixed/nuclio/function-gpu.yaml and to run the command again. But the log stays the same (so no additional steps were executed between

Step 12/17 : WORKDIR /opt/nuclio ---> Using cache ---> 24c47938ae64 and Step 13/17 : RUN apt update && apt install --no-install-recommends -y git curl ---> Running in 3dff9034dfcc

Additional steps I wanted to add are commands from https://github.com/NVIDIA/nvidia-container-toolkit/issues/257

I edited this fragment of the function file:

build: image: cvat/tf.matterport.mask_rcnn baseImage: tensorflow/tensorflow:1.15.5-gpu-py3 directives: postCopy:

Unfortunately, new steps did not appear in the presented log. I am wondering whether some compy of this file is cached somewhere in docker and this is why new commands are not seen, or maybe a different file is used, or even maybe my commands are wrong and therefore not executed? Whichever scenraio it is, I have decided to ask for help here.

This also would be equivalent to solution of this issue.

The matter is very important and urgent. I have many people simultaneously doing heavy computations in that docker on CPU instead of GPU just because of this failure.

oxyhexagen commented 2 years ago

Have you solved it yet? I googled everywhere and this is the only issue I found same with me

JRGit4UE commented 2 years ago

@belkahorry actually I refused to create a docker image on my own and preferred to wait for an update from nvidia

brucefay1115 commented 1 year ago

I modify serverless/tensorflow/matterport/mask_rcnn/nuclio/function.yaml not function-gpu.yaml add apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub

that can build successfully

postCopy:
        - kind: WORKDIR
          value: /opt/nuclio
        - kind: RUN
          value: apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub && apt update && apt install --no-install-recommends -y git curl
        - kind: RUN
          value: git clone --depth 1 https://github.com/matterport/Mask_RCNN.git
        - kind: RUN
          value: curl -L https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 -o Mask_RCNN/mask_rcnn_coco.h5
        - kind: RUN
          value: pip3 install numpy cython pyyaml keras==2.1.0 scikit-image 'imageio<=2.9.0' Pillow
VincentChong123 commented 1 year ago

Thanks @brucefay1115! It works! My system information: python 3.7 ubuntu18.04 NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4

BTW, my first attempt was not working because of other active container, I have to peform 2 commands below then reattempt. docker-compose down docker ps -aq | xargs docker stop | xargs docker rm