Open zeevdri opened 1 year ago
Pinging @cep21 the maintainer of nvml
integration.
It looks like the grpcio==1.27.2 (configured in the requirenment.in) cause the issue, it should upgrade to >1.35.0 (related issue) I am able to build the image with grpcio upgraded, but is getting another problem, which should have been fixed here. @cep21 when the changes can be released? seem 1.0.6 is already half year old
The NVML integration is now owned by datadog: https://github.com/DataDog/integrations-core/pull/14538
I was able to build for ARM locally but had to fork integrations-extra.
Hi, in the end we decided to not move forward with moving this integration to core and will be building a DCGM exporter based one instead, let us know if you'd be interested in trying out an early version. @cep21, @brianchan661's You can release a new version of this integration following these docs https://datadoghq.dev/integrations-core/process/integration-release/#creating-the-release
I ran into a similar issue today using requirements from 1.0.7 release:
#16 [linux/arm64 3/3] RUN curl https://raw.githubusercontent.com/DataDog/integrations-extras/nvml-1.0.7/nvml/requirements.in > requirements.in && /opt/datadog-agent/embedded/bin/pip3 install -r requirements.in
#16 3.519 Preparing metadata (setup.py): started
#16 4.233 Preparing metadata (setup.py): finished with status 'error'
#16 4.240 error: subprocess-exited-with-error
#16 4.240
#16 4.240 × python setup.py egg_info did not run successfully.
#16 4.240 │ exit code: 1
#16 4.240 ╰─> [14 lines of output]
#16 4.240 /tmp/pip-install-5dxksksa/grpcio_da0a47e9c5734c29b6873bcaa9018bc6/src/python/grpcio/commands.py:102: SyntaxWarning: "is not" with a literal. Did you mean "!="?
#16 4.240 if exit_code is not 0:
#16 4.240 Traceback (most recent call last):
#16 4.240 File "<string>", line 2, in <module>
#16 4.240 File "<pip-setuptools-caller>", line 34, in <module>
#16 4.240 File "/tmp/pip-install-5dxksksa/grpcio_da0a47e9c5734c29b6873bcaa9018bc6/setup.py", line 191, in <module>
#16 4.240 if check_linker_need_libatomic():
#16 4.240 File "/tmp/pip-install-5dxksksa/grpcio_da0a47e9c5734c29b6873bcaa9018bc6/setup.py", line 149, in check_linker_need_libatomic
#16 4.240 cc_test = subprocess.Popen(['cc', '-x', 'c++', '-std=c++11', '-'],
#16 4.240 File "/opt/datadog-agent/embedded/lib/python3.8/subprocess.py", line 858, in __init__
#16 4.240 self._execute_child(args, executable, preexec_fn, close_fds,
#16 4.240 File "/opt/datadog-agent/embedded/lib/python3.8/subprocess.py", line 1720, in _execute_child
#16 4.240 raise child_exception_type(errno_num, err_msg, err_filename)
#16 4.240 FileNotFoundError: [Errno 2] No such file or directory: 'cc'
#16 4.240 [end of output]
#16 4.240
#16 4.240 note: This error originates from a subprocess, and is likely not a problem with pip.
#16 4.242 error: metadata-generation-failed
#16 4.242
#16 4.242 × Encountered error while generating package metadata.
#16 4.242 ╰─> See above for output.
#16 4.242
#16 4.242 note: This is an issue with the package mentioned above, not pip.
#16 4.242 hint: See above for details.
#16 4.368
#16 4.368 [notice] A new release of pip is available: 23.0.1 -> 23.2.1
#16 4.368 [notice] To update, run: python3 -m pip install --upgrade pip
#16 ERROR: process "/bin/sh -c curl https://raw.githubusercontent.com/DataDog/integrations-extras/nvml-1.0.7/nvml/requirements.in > requirements.in && /opt/datadog-agent/embedded/bin/pip3 install -r requirements.in" did not complete successfully: exit code: 1
Building the docker image with my own local requirements.in
resulted in a successful build:
pynvml==11.4.1
# https://github.com/grpc/grpc/issues/21283
# upgrading to solve this issue ^, was previously version 1.27.2
grpcio==1.57.0
Dockerfile:
FROM gcr.io/datadoghq/agent:7.46.0
RUN agent integration install -t -r datadog-nvml==1.0.7
COPY requirements.in /tmp/requirements.in
RUN /opt/datadog-agent/embedded/bin/pip3 install -r /tmp/requirements.in
# Why do you need these variables: See https://github.com/NVIDIA/nvidia-docker/wiki/Usage
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES all
I've tried building the Dockerfile for arm64 and I get the following error