arkitektio / arkitekt

arkitekt is the python api client for the arkitekt-framework
https://arkitekt.live
MIT License
2 stars 1 forks source link

arkitekt port build is failing for cuda #15

Open alexschroeter opened 3 months ago

alexschroeter commented 3 months ago

It seems there is something missing for the inspect part of my Dockerfile (see below) although the build is working

Dockerfile

FROM nvidia/cuda:12.3.1-runtime-ubuntu22.04

RUN apt update && apt install -y python3 python3-pip nvidia-opencl-dev clinfo nvidia-opencl-icd-384

RUN pip install "arkitekt[all]"

RUN pip install pyclesperanto-prototype

RUN mkdir /app
WORKDIR /app
COPY .arkitekt /app/.arkitekt
COPY app.py /app/app.py

Error:

Traceback (most recent call last):
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/arkitekt/cli/commands/port/build.py", line 106, in inspect_definitions
    output = json.loads(result.stdout)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 1)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/bin/arkitekt", line 8, in <module>
    sys.exit(cli())
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/rich_click/rich_command.py", line 126, in main
    rv = self.invoke(ctx)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/arkitekt/cli/commands/port/build.py", line 221, in build
    inspection = inspect_build(build_tag)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/arkitekt/cli/commands/port/build.py", line 133, in inspect_build
    definitions = inspect_definitions(build_id)
  File "/home/aschroeter/miniconda3/envs/pyclesperanto-arkitekt/lib/python3.9/site-packages/arkitekt/cli/commands/port/build.py", line 109, in inspect_definitions
    raise InspectionError(
arkitekt.cli.commands.port.build.InspectionError: Could not decode JSON output of docker inspect. 
==========
== CUDA ==
==========

CUDA Version 12.3.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

{"definitions": [{"description": "Computes the absolute value of every individual pixel x in a given image.\n\nf(x) = |x|", "collections": [], "name": "absolute", "portGroups": [], "args": [{"identifier": "@mikro/representation", "key": "source", "scope": "GLOBAL", "label": "source", "kind": "STRUCTURE", "description": "The input image to be processed.", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "returns": [{"identifier": "@mikro/representation", "key": "return0", "scope": "GLOBAL", "kind": "STRUCTURE", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "interfaces": [], "kind": "FUNCTION"}, {"description": "No Description", "collections": [], "name": "test", "portGroups": [], "args": [{"identifier": "@mikro/representation", "key": "source", "scope": "GLOBAL", "kind": "STRUCTURE", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "returns": [{"identifier": "@mikro/representation", "key": "return0", "scope": "GLOBAL", "kind": "STRUCTURE", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "interfaces": [], "kind": "FUNCTION"}, {"description": "No Description", "collections": [], "name": "create voronoi labels", "portGroups": [], "args": [{"identifier": "@mikro/representation", "key": "representation", "scope": "GLOBAL", "label": "representation", "kind": "STRUCTURE", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "returns": [{"identifier": "@mikro/representation", "key": "return0", "scope": "GLOBAL", "kind": "STRUCTURE", "description": "A string with Hello {n}", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "interfaces": [], "kind": "FUNCTION"}, {"description": "No Description", "collections": [], "name": "create voronoi labels", "portGroups": [], "args": [{"identifier": "@mikro/representation", "key": "representation", "scope": "GLOBAL", "label": "representation", "kind": "STRUCTURE", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}, {"key": "radius_x", "scope": "GLOBAL", "kind": "INT", "default": 5, "nullable": true, "annotations": []}, {"key": "radius_y", "scope": "GLOBAL", "kind": "INT", "default": 5, "nullable": true, "annotations": []}, {"key": "spot_sigma", "scope": "GLOBAL", "kind": "INT", "default": 1, "nullable": true, "annotations": []}, {"key": "radius", "scope": "GLOBAL", "kind": "INT", "default": 10, "nullable": true, "annotations": []}], "returns": [{"identifier": "@mikro/representation", "key": "return0", "scope": "GLOBAL", "kind": "STRUCTURE", "description": "A string with Hello {n}", "assignWidget": {"kind": "SearchWidget", "query": "query search_representation($search: String, $values: [ID]) {\n  options: representations(name: $search, limit: 20, ids: $values) {\n    value: id\n    label: name\n  }\n}", "ward": "mikro"}, "nullable": false}], "interfaces": [], "kind": "FUNCTION"}]}
jhnnsrs commented 3 months ago

Ah i understand the error. When inspecting the definitions in the container the definition registry is dumped to stdin, which is then parse through the CLI. However if the app.py dumps to stdin anyways it cannot parse the JSON. easiest fix is to prepfix the inspectedted definitions with a magic phrase so that we could detect that in the cli. But open for other ideas!

alexschroeter commented 3 months ago

Ok, I have the following thoughts.

  1. As a fix, I can also silence the CUDA message which seems to be the problem here but we should look for a more stable solution like you suggest.
  2. Would it be possible to use the tasks API to get the task logs. I believe they can be separated by the task ID and don't read from the stdout. -in, -err interface?
jhnnsrs commented 1 month ago
  1. I think we could still rely on stdout and integrat a magic bytstring that would indicate the beginning of the definitions and the end e.g. "MY_MAGIC_ARKITEKT_KEYWORD::{"json": "defintions"}::MY_MAGIC_ARKITEKT_KEYWORK", this would be the easiest workaround for now.

  2. I believe this should be the long term goal, especially if we would like to run a container with a specific task once and then shut it down, something like "arkitekt call function_name arg=0 arg1=1". There was a functional prototype for this behaviour once, but I have not investiragted it further.

This is the relevant codepart: arkitekt/cli/commands/call/local.py, however with reuest_next the api is going to change slightly (i think it might still be worthwhile to test it out though.