Can't build SiamMask because of old torch requirement

VRichardJP commented 2 years ago

My actions before raising this issue

[x] Read/searched the docs
[x] Searched past issues

Expected Behaviour

The command ./serverless/deploy_cpu.sh ./serverless/pytorch/foolwood/siammask/ or ./serverless/deploy_gpu.sh ./serverless/pytorch/foolwood/siammask/ successfully deploy siammask network

Current Behaviour

The build fails because of this line: (https://github.com/openvinotoolkit/cvat/blob/be334fdee95563b54c011290ffa6b4bbf9fd4296/serverless/pytorch/foolwood/siammask/nuclio/function.yaml#L44) The requirements.txt file requires torch==0.4.1, but the package does not exist anymore.

Full logs:

$ ./serverless/deploy_cpu.sh ./serverless/pytorch/foolwood/siammask/       
22.03.16 12:59:00.947                     nuctl (I) Project created {"Name": "cvat", "Namespace": "nuclio"}
Deploying ./serverless/pytorch/foolwood/siammask function...
22.03.16 12:59:01.218                     nuctl (I) Deploying function {"name": ""}
22.03.16 12:59:01.218                     nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.7.11, Git commit: afc97384b92e3dd2c75c9ec18b069cff986427e0, OS: linux, Arch: amd64, Go version: go1.17.5", "name": ""}
22.03.16 12:59:01.373                     nuctl (I) Cleaning up before deployment {"functionName": "pth-foolwood-siammask"}
22.03.16 12:59:01.408                     nuctl (I) Function already exists, deleting function containers {"functionName": "pth-foolwood-siammask"}
22.03.16 12:59:03.052                     nuctl (I) Staging files and preparing base images
22.03.16 12:59:03.053                     nuctl (W) Python 3.6 runtime is deprecated and will soon not be supported. Please migrate your code and use Python 3.7 runtime (`python:3.7`) or higher
22.03.16 12:59:03.053                     nuctl (I) Building processor image {"registryURL": "", "imageName": "cvat/pth.foolwood.siammask:latest"}
22.03.16 12:59:03.053     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.7.11-amd64"}
22.03.16 12:59:11.837     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
22.03.16 12:59:23.061            nuctl.platform (I) Building docker image {"image": "cvat/pth.foolwood.siammask:latest"}
22.03.16 12:59:31.842     nuctl.platform.docker (W) Docker command outputted to stderr - this may result in errors {"workingDir": "/tmp/nuclio-build-3423946007/staging", "cmd": "docker build --network host --force-rm -t cvat/pth.foolwood.siammask:latest -f /tmp/nuclio-build-3423946007/staging/Dockerfile.processor   --build-arg NUCLIO_LABEL=1.7.11 --build-arg NUCLIO_ARCH=amd64 --build-arg NUCLIO_BUILD_LOCAL_HANDLER_DIR=handler  .", "stderr": "The command 'conda run -n siammask /bin/bash -c pip install -r SiamMask/requirements.txt jsonpickle' returned a non-zero code: 1\n"}
22.03.16 12:59:31.859                     nuctl (W) Failed to create a function; setting the function status {"err": "Failed to build processor image", "errVerbose": "\nError - exit status 1\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\n\nCall stack:\nstdout:\nSending build context to Docker daemon  49.71MB\r\r\nStep 1/25 : FROM ubuntu:20.04\n ---> 2b4cba85892a\nStep 2/25 : ARG NUCLIO_LABEL\n ---> Using cache\n ---> 8c49d887b805\nStep 3/25 : ARG NUCLIO_ARCH\n ---> Using cache\n ---> 7ad1d0f0ddd5\nStep 4/25 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR\n ---> Using cache\n ---> e90f06271d4b\nStep 5/25 : ENV PATH=\"/root/miniconda3/bin:${PATH}\"\n ---> Using cache\n ---> c542ca01b724\nStep 6/25 : ARG PATH=\"/root/miniconda3/bin:${PATH}\"\n ---> Using cache\n ---> 81afa22be6ae\nStep 7/25 : RUN apt update && apt install -y --no-install-recommends wget git ca-certificates libglib2.0-0 libsm6 libxrender1 libxext6 && rm -rf /var/lib/apt/lists/*\n ---> Using cache\n ---> 7639a6a83a4f\nStep 8/25 : RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && chmod +x Miniconda3-latest-Linux-x86_64.sh && ./Miniconda3-latest-Linux-x86_64.sh -b && rm -f Miniconda3-latest-Linux-x86_64.sh\n ---> Using cache\n ---> 9e6f4ae53ba8\nStep 9/25 : WORKDIR /opt/nuclio\n ---> Using cache\n ---> 3c105427d793\nStep 10/25 : RUN conda create -y -n siammask python=3.6\n ---> Using cache\n ---> cce0aaf61c0e\nStep 11/25 : SHELL [\"conda\", \"run\", \"-n\", \"siammask\", \"/bin/bash\", \"-c\"]\n ---> Using cache\n ---> 3ff219712725\nStep 12/25 : RUN git clone https://github.com/foolwood/SiamMask.git\n ---> Using cache\n ---> 9bc2727b5285\nStep 13/25 : RUN pip install -r SiamMask/requirements.txt jsonpickle\n ---> Running in 5979fdfd83ce\n\u001b[91mERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['/bin/bash', '-c', 'pip install -r SiamMask/requirements.txt jsonpickle']' command failed.  (See above for error)\n\u001b[0m\u001b[91mERROR: Could not find a version that satisfies the requirement torch==0.4.1 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2)\nERROR: No matching distribution found for torch==0.4.1\n\n\u001b[0mCollecting jsonpickle\n  Downloading jsonpickle-2.1.0-py2.py3-none-any.whl (38 kB)\nCollecting Cython==0.29.4\n  Downloading Cython-0.29.4-cp36-cp36m-manylinux1_x86_64.whl (2.1 MB)\nCollecting colorama==0.3.9\n  Downloading colorama-0.3.9-py2.py3-none-any.whl (20 kB)\nCollecting numpy==1.15.4\n  Downloading numpy-1.15.4-cp36-cp36m-manylinux1_x86_64.whl (13.9 MB)\nCollecting requests==2.21.0\n  Downloading requests-2.21.0-py2.py3-none-any.whl (57 kB)\nCollecting fire==0.1.3\n  Downloading fire-0.1.3.tar.gz (33 kB)\n\nRemoving intermediate container 5979fdfd83ce\n\nstderr:\nThe command 'conda run -n siammask /bin/bash -c pip install -r SiamMask/requirements.txt jsonpickle' returned a non-zero code: 1\n\n    /nuclio/pkg/cmdrunner/shellrunner.go:96\nFailed to build\n    /nuclio/pkg/dockerclient/shell.go:117\nFailed to build docker image\n    .../pkg/containerimagebuilderpusher/docker.go:54\nFailed to build processor image\n    /nuclio/pkg/processor/build/builder.go:263\nFailed to build processor image"}

Error - exit status 1
    /nuclio/pkg/cmdrunner/shellrunner.go:96

Call stack:
stdout:
Sending build context to Docker daemon  49.71MB
Step 1/25 : FROM ubuntu:20.04
 ---> 2b4cba85892a
Step 2/25 : ARG NUCLIO_LABEL
 ---> Using cache
 ---> 8c49d887b805
Step 3/25 : ARG NUCLIO_ARCH
 ---> Using cache
 ---> 7ad1d0f0ddd5
Step 4/25 : ARG NUCLIO_BUILD_LOCAL_HANDLER_DIR
 ---> Using cache
 ---> e90f06271d4b
Step 5/25 : ENV PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> c542ca01b724
Step 6/25 : ARG PATH="/root/miniconda3/bin:${PATH}"
 ---> Using cache
 ---> 81afa22be6ae
Step 7/25 : RUN apt update && apt install -y --no-install-recommends wget git ca-certificates libglib2.0-0 libsm6 libxrender1 libxext6 && rm -rf /var/lib/apt/lists/*
 ---> Using cache
 ---> 7639a6a83a4f
Step 8/25 : RUN wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && chmod +x Miniconda3-latest-Linux-x86_64.sh && ./Miniconda3-latest-Linux-x86_64.sh -b && rm -f Miniconda3-latest-Linux-x86_64.sh
 ---> Using cache
 ---> 9e6f4ae53ba8
Step 9/25 : WORKDIR /opt/nuclio
 ---> Using cache
 ---> 3c105427d793
Step 10/25 : RUN conda create -y -n siammask python=3.6
 ---> Using cache
 ---> cce0aaf61c0e
Step 11/25 : SHELL ["conda", "run", "-n", "siammask", "/bin/bash", "-c"]
 ---> Using cache
 ---> 3ff219712725
Step 12/25 : RUN git clone https://github.com/foolwood/SiamMask.git
 ---> Using cache
 ---> 9bc2727b5285
Step 13/25 : RUN pip install -r SiamMask/requirements.txt jsonpickle
 ---> Running in 5979fdfd83ce
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['/bin/bash', '-c', 'pip install -r SiamMask/requirements.txt jsonpickle']' command failed.  (See above for error)
ERROR: Could not find a version that satisfies the requirement torch==0.4.1 (from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0, 1.3.1, 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2)
ERROR: No matching distribution found for torch==0.4.1

Collecting jsonpickle
  Downloading jsonpickle-2.1.0-py2.py3-none-any.whl (38 kB)
Collecting Cython==0.29.4
  Downloading Cython-0.29.4-cp36-cp36m-manylinux1_x86_64.whl (2.1 MB)
Collecting colorama==0.3.9
  Downloading colorama-0.3.9-py2.py3-none-any.whl (20 kB)
Collecting numpy==1.15.4
  Downloading numpy-1.15.4-cp36-cp36m-manylinux1_x86_64.whl (13.9 MB)
Collecting requests==2.21.0
  Downloading requests-2.21.0-py2.py3-none-any.whl (57 kB)
Collecting fire==0.1.3
  Downloading fire-0.1.3.tar.gz (33 kB)

Removing intermediate container 5979fdfd83ce

stderr:
The command 'conda run -n siammask /bin/bash -c pip install -r SiamMask/requirements.txt jsonpickle' returned a non-zero code: 1

    /nuclio/pkg/cmdrunner/shellrunner.go:96
Failed to build
    /nuclio/pkg/dockerclient/shell.go:117
Failed to build docker image
    .../pkg/containerimagebuilderpusher/docker.go:54
Failed to build processor image
    /nuclio/pkg/processor/build/builder.go:263
Failed to deploy function
    ...//nuclio/pkg/platform/abstract/platform.go:197
  NAMESPACE |         NAME          | PROJECT | STATE | REPLICAS | NODE PORT  
  nuclio    | openvino-dextr        | cvat    | ready | 1/1      |     49157  
  nuclio    | pth-foolwood-siammask | cvat    | error | 1/1      |            
  nuclio    | pth-saic-vul-hrnet    | cvat    | ready | 1/1      |     49159

Possible Solution

Possible solutions:

create pull request to fix upstream requirements.txt, but the project seems not to be maintained anymore.
fork the repository and fix the requirements.txt (eg. https://github.com/VRichardJP/SiamMask/blob/master/requirements.txt) and change github repository in ./serverless/pytorch/foolwood/siammask/. Example: https://github.com/VRichardJP/cvat
replace pip install -r SiamMask/requirements.txt line in function.yaml and function-gpu.yaml by the list of required packages

Steps to Reproduce (for bugs)

Run ./serverless/deploy_cpu.sh ./serverless/pytorch/foolwood/siammask/

Context

Your Environment

Git hash commit (git log -1): be334fdee95563b54c011290ffa6b4bbf9fd4296
Docker version docker version (e.g. Docker 17.0.05): 20.10.9
Are you using Docker Swarm or Kubernetes? No
Operating System and version (e.g. Linux, Windows, MacOS): Ubuntu 18.04
Code example or link to GitHub repo or gist to reproduce problem:
Other diagnostic information / logs:

Logs from `cvat` container

Next steps

You may join our Gitter channel for community support.

bsekachev commented 2 years ago

@VRichardJP

Thank you for the report. Could you please help us to fix the issue?

VRichardJP commented 2 years ago

I can help to fix. I have tested the following 2 solutions:

replace the pip install -r SiamMask/requirements.txt by individual packages from https://github.com/foolwood/SiamMask/blob/master/requirements.txt but with torch==1.9.0
switch to a fork foolwood/SiamMask to fix upstream (see https://github.com/VRichardJP/cvat)

Both solutions fix the build:

$ ./serverless/deploy_gpu.sh serverless/pytorch/foolwood/siammask/
22.03.17 09:02:54.519                     nuctl (I) Project created {"Name": "cvat", "Namespace": "nuclio"}
Deploying serverless/pytorch/foolwood/siammask function...
22.03.17 09:02:54.894                     nuctl (I) Deploying function {"name": ""}
22.03.17 09:02:54.894                     nuctl (I) Building {"builderKind": "docker", "versionInfo": "Label: 1.7.11, Git commit: afc97384b92e3dd2c75c9ec18b069cff986427e0, OS: linux, Arch: amd64, Go version: go1.17.5", "name": ""}
22.03.17 09:02:55.076                     nuctl (I) Cleaning up before deployment {"functionName": "pth-foolwood-siammask"}
22.03.17 09:02:55.100                     nuctl (I) Staging files and preparing base images
22.03.17 09:02:55.178                     nuctl (W) Python 3.6 runtime is deprecated and will soon not be supported. Please migrate your code and use Python 3.7 runtime (`python:3.7`) or higher
22.03.17 09:02:55.178                     nuctl (I) Building processor image {"registryURL": "", "imageName": "cvat/pth.foolwood.siammask:latest"}
22.03.17 09:02:55.178     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/handler-builder-python-onbuild:1.7.11-amd64"}
22.03.17 09:03:04.446     nuctl.platform.docker (I) Pulling image {"imageName": "quay.io/nuclio/uhttpc:0.0.1-amd64"}
22.03.17 09:03:15.338            nuctl.platform (I) Building docker image {"image": "cvat/pth.foolwood.siammask:latest"}
22.03.17 09:03:15.953            nuctl.platform (I) Pushing docker image into registry {"image": "cvat/pth.foolwood.siammask:latest", "registry": ""}
22.03.17 09:03:15.953            nuctl.platform (I) Docker image was successfully built and pushed into docker registry {"image": "cvat/pth.foolwood.siammask:latest"}
22.03.17 09:03:15.953                     nuctl (I) Build complete {"result": {"Image":"cvat/pth.foolwood.siammask:latest","UpdatedFunctionConfig":{"metadata":{"name":"pth-foolwood-siammask","namespace":"nuclio","labels":{"nuclio.io/project-name":"cvat"},"annotations":{"framework":"pytorch","name":"SiamMask","spec":"","type":"tracker"}},"spec":{"description":"Fast Online Object Tracking and Segmentation","handler":"main:handler","runtime":"python:3.6","env":[{"name":"PYTHONPATH","value":"/opt/nuclio/SiamMask:/opt/nuclio/SiamMask/experiments/siammask_sharp"}],"resources":{"limits":{"nvidia.com/gpu":"1"}},"image":"cvat/pth.foolwood.siammask:latest","targetCPU":75,"triggers":{"myHttpTrigger":{"class":"","kind":"http","name":"myHttpTrigger","maxWorkers":2,"workerAvailabilityTimeoutMilliseconds":10000,"attributes":{"maxRequestBodySize":33554432}}},"volumes":[{"volume":{"name":"volume-1","hostPath":{"path":"/home/vrichard/ML/cvat/serverless/common"}},"volumeMount":{"name":"volume-1","mountPath":"/opt/nuclio/common"}}],"build":{"functionConfigPath":"serverless/pytorch/foolwood/siammask//nuclio/function-gpu.yaml","image":"cvat/pth.foolwood.siammask","baseImage":"nvidia/cuda:11.1-devel-ubuntu20.04","directives":{"preCopy":[{"kind":"ENV","value":"PATH=\"/root/miniconda3/bin:${PATH}\""},{"kind":"ARG","value":"PATH=\"/root/miniconda3/bin:${PATH}\""},{"kind":"RUN","value":"apt update && apt install -y --no-install-recommends wget git ca-certificates libglib2.0-0 libsm6 libxrender1 libxext6 && rm -rf /var/lib/apt/lists/*"},{"kind":"RUN","value":"wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && chmod +x Miniconda3-latest-Linux-x86_64.sh && ./Miniconda3-latest-Linux-x86_64.sh -b && rm -f Miniconda3-latest-Linux-x86_64.sh"},{"kind":"WORKDIR","value":"/opt/nuclio"},{"kind":"RUN","value":"conda create -y -n siammask python=3.7"},{"kind":"SHELL","value":"[\"conda\", \"run\", \"-n\", \"siammask\", \"/bin/bash\", \"-c\"]"},{"kind":"RUN","value":"git clone https://github.com/VRichardJP/SiamMask.git"},{"kind":"RUN","value":"pip install -r SiamMask/requirements.txt jsonpickle"},{"kind":"RUN","value":"pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html"},{"kind":"RUN","value":"conda install -y gcc_linux-64"},{"kind":"RUN","value":"cd SiamMask && bash make.sh && cd -"},{"kind":"RUN","value":"wget -P SiamMask/experiments/siammask_sharp http://www.robots.ox.ac.uk/~qwang/SiamMask_DAVIS.pth"},{"kind":"ENTRYPOINT","value":"[\"conda\", \"run\", \"-n\", \"siammask\"]"}]},"codeEntryType":"image"},"platform":{"attributes":{"mountMode":"volume","restartPolicy":{"maximumRetryCount":3,"name":"always"}}},"readinessTimeoutSeconds":60,"securityContext":{},"eventTimeout":"30s"}}}}
22.03.17 09:03:24.104            nuctl.platform (I) Waiting for function to be ready {"timeout": 60}
22.03.17 09:03:28.750                     nuctl (I) Function deploy complete {"functionName": "pth-foolwood-siammask", "httpPort": 49160, "internalInvocationURLs": ["172.17.0.4:8080"], "externalInvocationURLs": []}
  NAMESPACE |         NAME          | PROJECT | STATE | REPLICAS | NODE PORT  
  nuclio    | openvino-dextr        | cvat    | ready | 1/1      |     49157  
  nuclio    | pth-foolwood-siammask | cvat    | ready | 1/1      |     49160  
  nuclio    | pth-saic-vul-hrnet    | cvat    | ready | 1/1      |     49159

However it does not make SiamMask work in CVAT. I have tried several time to track an object with SiamMask. Whenever I jump to the next frame, I get a notification that the tracker is being initialized and then this error comes:

Tracking Error
TypeError: Cannot read properties of undefined (reading 'map')

The problem occurs both with CPU and GPU version.

From a quick investigation, the problem comes from here: https://github.com/openvinotoolkit/cvat/blob/93ccf2177f560d037fdfad732b98292abfec5944/cvat-ui/src/components/annotation-page/standard-workspace/controls-side-bar/tools-control.tsx#L737-L743

The response is undefined after the call, so the response.shapes.map() raises an error which is caught here: https://github.com/openvinotoolkit/cvat/blob/93ccf2177f560d037fdfad732b98292abfec5944/cvat-ui/src/components/annotation-page/standard-workspace/controls-side-bar/tools-control.tsx#L760-L764

Unfortunately, I am not familiar at all with the framework nor TS, so it is difficult for me to figure out exactly what is going on. If you can help it would be much appreciated.

nmanovic commented 2 years ago

@VRichardJP , I will vote for the first variant. Let's froze dependencies.

ztyree42 commented 2 years ago

I'm seeing the same thing where the response from the request to invoke siammask is []. @VRichardJP were you able to successfuly invoke the function, either via cvat or manually via nuctl invoke? What nuclio version have you tried?

VRichardJP commented 2 years ago

I have first tried with version 1.5.16, which is the version recommended in the documentation. Since I couldn't make it work, I tried to update to a newer version Currently I am using 1.7.11. I have tried to test the function with nutcl invoke, but I am not familiar with nuclio and don't know how to send the input data to the function, so I get this error:

$ nuctl invoke pth-foolwood-siammask
22.03.19 13:26:37.383    nuctl.platform.invoker (I) Executing function {"method": "GET", "url": "http://:49161", "bodyLength": 0, "headers": {"Content-Type":["text/plain"],"X-Nuclio-Log-Level":["info"],"X-Nuclio-Target":["pth-foolwood-siammask"]}}
22.03.19 13:26:38.767    nuctl.platform.invoker (I) Got response {"status": "500 Internal Server Error"}
22.03.19 13:26:38.767                     nuctl (I) >>> Start of function logs
22.03.19 13:26:38.767     pth-foolwood-siammask (I) Run SiamMask model {"time": 1647663998491.4946, "worker_id": "0"}
22.03.19 13:26:38.768     pth-foolwood-siammask (E) Exception caught in handler {"exc": "'NoneType' object is not subscriptable", "traceback": "Traceback (most recent call last):\n  File \"/opt/nuclio/_nuclio_wrapper.py\", line 118, in serve_requests\n    await self._handle_event(event)\n  File \"/opt/nuclio/_nuclio_wrapper.py\", line 312, in _handle_event\n    entrypoint_output = self._entrypoint(self._context, event)\n  File \"/opt/nuclio/main.py\", line 19, in handler\n    buf = io.BytesIO(base64.b64decode(data[\"image\"]))\nTypeError: 'NoneType' object is not subscriptable\n", "worker_id": "0", "time": 1647663998749.482}
22.03.19 13:26:38.768                     nuctl (I) <<< End of function logs

> Response headers:
Content-Type = text/plain
Content-Length = 497
Server = nuclio
Date = Sat, 19 Mar 2022 04:26:38 GMT

> Response body:
Exception caught in handler - "'NoneType' object is not subscriptable": Traceback (most recent call last):
  File "/opt/nuclio/_nuclio_wrapper.py", line 118, in serve_requests
    await self._handle_event(event)
  File "/opt/nuclio/_nuclio_wrapper.py", line 312, in _handle_event
    entrypoint_output = self._entrypoint(self._context, event)
  File "/opt/nuclio/main.py", line 19, in handler
    buf = io.BytesIO(base64.b64decode(data["image"]))
TypeError: 'NoneType' object is not subscriptable

ztyree42 commented 2 years ago

@VRichardJP you can invoke it from the nuclio dashboard using the "test" feature, the easiest way to see what the payload needs to be is to inspect the network request coming from cvat in the browser dev tools. You'll see that it's a JSON object with shapes and states keys (and some metadata I think) what's confusing to me is that it seems like main.handler is getting the image information from the context instead of the event but I haven't figured out how that's being sent over. Perhaps it isn't being sent over and that's why the function is returning []? At any rate, I'm not sure it makes sense to PR the changes if we can't invoke the function...

ztyree42 commented 2 years ago

Any update on this? The ability to track objects across frames is pretty critical for my video annotation workflow.

casperthuis commented 2 years ago

I have been reading the chat and been testing it myself as well. With the new changes added to the PR I'm able to run the siammask cpu version with the command

serverless/deploy_cpu.sh serverless/pytorch/foolwood/siammask/

If I run the serverless/deploy_cpu.sh command the script gets stuck without return any error. This should be adresses but I was unable to retrieve output from the script itself. To still get the gpu version up and running I adapted the command for deploying directly with nuctl from mask_rcnn to siammask gpu as follows

nuctl deploy --project-name cvat \                            
  --path serverless/pytorch/foolwood/siammask/nuclio/ \
  --platform local --base-image nvidia/cuda:11.1-devel-ubuntu20.04 \
  --desc "GPU based implementation of SIAM mask on Python 3, pytorch." \
  --image cvat/pth.foolwood.siammask \
  --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
  --resource-limit nvidia.com/gpu=1

This command does not get stuck and is able to run on nuctl version 1.5.16 for the gpu. However, when running the tracker I am getting the 500 error, other people have been getting also. When invoking the nuctl function with nuctl invoke pth-foolwood-siammask I get the same error as @VRichardJP.

> Response headers:
Content-Length = 491
Server = nuclio
Date = Tue, 05 Apr 2022 11:26:39 GMT
Content-Type = text/plain

> Response body:
Exception caught in handler - "'NoneType' object is not subscriptable": Traceback (most recent call last):
  File "/opt/nuclio/_nuclio_wrapper.py", line 114, in serve_requests
    self._handle_event(event)
  File "/opt/nuclio/_nuclio_wrapper.py", line 262, in _handle_event
    entrypoint_output = self._entrypoint(self._context, event)
  File "/opt/nuclio/main.py", line 19, in handler
    buf = io.BytesIO(base64.b64decode(data["image"]))
TypeError: 'NoneType' object is not subscriptable

Does anybody have more succes past this point? Or any pointers to where I should be adapting the code to properly load in the image needed in the buf variable? Or how I can debug/print the value of the event variable in nuclio docker?

ztyree42 commented 2 years ago

To invoke the function from the nuclio cli you're going to need to pass it some input. The example in the docs shows how to pass input and the browser dev tools can show you what the input is supposed to look like. Having said that, I gave up temporarily trying to get Siammask working on GPU myself. If you do please report back!

On Tue, Apr 5, 2022, 4:37 AM casperthuis @.***> wrote:

I have been reading the chat and been testing it myself as well. With the new changes added to the PR I'm able to run the siammask cpu version with the command

serverless/deploy_cpu.sh serverless/pytorch/foolwood/siammask/

If I run the serverless/deploy_cpu.sh command the script gets stuck without return any error. This should be adresses but I was unable to retrieve output from the script itself. To still get the gpu version up and running I adapted the command for deploying directly with nuctl from mask_rcnn to siammask gpu as follows

nuctl deploy --project-name cvat \ --path serverless/pytorch/foolwood/siammask/nuclio/ \ --platform local --base-image nvidia/cuda:11.1-devel-ubuntu20.04 \ --desc "GPU based implementation of SIAM mask on Python 3, pytorch." \ --image cvat/pth.foolwood.siammask \ --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \ --resource-limit nvidia.com/gpu=1

This command does not get stuck and is able to run on nuctl version 1.5.16 for the gpu. However, when running the tracker I am getting the 500 error, other people have been getting also. When invoking the nuctl function with nuctl invoke pth-foolwood-siammask I get the same error as @VRichardJP https://github.com/VRichardJP.

Response headers: Content-Length = 491 Server = nuclio Date = Tue, 05 Apr 2022 11:26:39 GMT Content-Type = text/plain

Response body: Exception caught in handler - "'NoneType' object is not subscriptable": Traceback (most recent call last): File "/opt/nuclio/_nuclio_wrapper.py", line 114, in serve_requests self._handle_event(event) File "/opt/nuclio/_nuclio_wrapper.py", line 262, in _handle_event entrypoint_output = self._entrypoint(self._context, event) File "/opt/nuclio/main.py", line 19, in handler buf = io.BytesIO(base64.b64decode(data["image"])) TypeError: 'NoneType' object is not subscriptable

Does anybody have more succes past time point? Or any pointers to where I should be adapting the code to properly load in the image needed in the buf variable?

— Reply to this email directly, view it on GitHub https://github.com/openvinotoolkit/cvat/issues/4475#issuecomment-1088597449, or unsubscribe https://github.com/notifications/unsubscribe-auth/AECDZPMJOROF6UD7ADG4KWTVDQQYVANCNFSM5Q2VBTAA . You are receiving this because you commented.Message ID: @.***>

casperthuis commented 2 years ago

@ztyree42 I managed to get the GPU version working by help of #3059 and some trial and error.

The main issues that I faced was that the serverless/deploy_gpu.py function seems to get stuck most of the times. After a couple of tried it is able to finish building, but I was only able to build it with the following command.

nuctl -v deploy --project-name cvat \                        
  --path serverless/pytorch/foolwood/siammask/nuclio/ \
  --platform local --base-image nvidia/cuda:11.1-devel-ubuntu20.04 \
  --desc "GPU based implementation of siammask on Python 3, pytorch." \
  --image cvat/pth.foolwood.siammask \
  --triggers '{"myHttpTrigger": {"maxWorkers": 1}}' \
  --resource-limit nvidia.com/gpu=1

Note that to run this command one would need to replace function.py by function-gpu.py. Also note the -v for the verbose option to keep track of what it is actually doing. I do not like this approach and would like to use the command serverless/deploy_gpy.py function. if somebody know why this is happening would love to hear.

Next to that the issue #3059 describes that the docker update changes something about the porting and therefore the nuclio version 1.5.6 handler function in main.py is getting the wrong input(or something like that). This is the reason that the error message states that the input is of nonetype. Changing the the nuclio version did work for me, however it is currently not running smoothly. When tracking at the moment, my whole system is having a hard time and the actual result is slower than when running on my cpu. I will check if it is possible to allocate more gpu/memory or cpu capacity to the docker and whether that would solve the lagging issue.

rohitsaluja22 commented 2 years ago

@ztyree42, I tried but got following error: Reading package lists... W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B46 9963BF863CC E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease' is not signed.

bsekachev commented 5 months ago

Closed as outdated

cvat-ai / cvat