GPU container for semi-auto annotation fails deploying

cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

https://cvat.ai

MIT License

12.73k stars 3.02k forks source link

GPU container for semi-auto annotation fails deploying #5170

Open journeytosilius opened 2 years ago

journeytosilius commented 2 years ago

When following:

https://opencv.github.io/cvat/docs/administration/advanced/installation_automatic_annotation/

nuctl deploy --project-name cvat \
  --path serverless/tensorflow/matterport/mask_rcnn/nuclio \
  --platform local --base-image tensorflow/tensorflow:1.15.5-gpu-py3 \
  --desc "GPU based implementation of Mask RCNN on Python 3, Keras, and TensorFlow." \
  --image cvat/tf.matterport.mask_rcnn_gpu \
  --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
  --resource-limit nvidia.com/gpu=1

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is no longer signed.

Is there a published solution to this ? Thanks

GalymzhanAbdimanap commented 2 years ago

I have the same problem, have you solved it?

nmanovic commented 1 year ago

@yasakova-anastasia , could you please check if we still have the issue?

JoshuaRiddell commented 1 year ago

I just ran into this, it looks to be related to NVIDIA rotating their APT keys. https://forums.developer.nvidia.com/t/invalid-public-key-for-cuda-apt-repository/212901

I managed to get it to build and deploy with tensorflow/tensorflow:2.10.0-gpu, but I had trouble executing it. I'm on my first install though so I'm currently trying other models to see if it's a more general issue with my setup.

SergeySandler commented 1 year ago

@yasakova-anastasia, this issue is a duplicate of https://github.com/opencv/cvat/issues/4707. The Possible Solution described by the submitter of the issue #4707 works, just drop sudo. Inserting the following two lines at line 109 after WORKDIR kind/value pair:

        - kind: RUN
          value:apt-key del 7fa2af80 && apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub

in https://github.com/opencv/cvat/blob/develop/serverless/tensorflow/matterport/mask_rcnn/nuclio/function-gpu.yaml makes serverless/deploy_gpu.sh serverless/tensorflow/matterport/mask_rcnn to run without errors.

prnr commented 1 year ago

I tried this method, but the process stopped at [nuctl.platform (I) Building docker image {"image": "cvat/tf.matterport.mask_rcnn_gpu:latest"}] (more than 24 hours) and it doesn't proceed anymore. Other DL models could be deployed without any problems, but only TF models are having a hard time.