Open mvoodarla opened 1 year ago
Thanks for your issue.
We have not tested the model on docker. Can you provide more details of the outputs?
@mvoodarla I think maybe it's because Rust is not installed.
This is during inference, not training. It's likely some system package thing, I'll keep investigating.
@mvoodarla I believe @Innary is right -- I had the same issue locally until I installed both cargo
and rustc
.
sudo apt update && sudo apt install -y cargo rustc
Also needed to build the image using NVIDIA container runtime. By default, Docker uses the non-NVIDIA runtime for building the image, which causes issues when building the custom CUDA ops for GroundingDINO.
Easiest way I found to override that is to update /etc/docker/daemon.json
:
[Link to StackOverflow discussion]
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
+ "default-runtime": "nvidia"
}
With that change, this Dockerfile is working for me. Able to load the model, run inference on GPU, etc.
ARG UBUNTU_VERSION="20.04"
ARG CUDA_VERSION="11.6.1"
ARG CUDA_OPSET="devel"
ARG PY_VERSION="3.10"
FROM nvidia/cuda:$CUDA_VERSION-$CUDA_OPSET-ubuntu$UBUNTU_VERSION
ENV DEBIAN_FRONTEND="noninteractive"
# TODO: Check if any of these 'apt' packages are unnecessary
RUN apt update --fix-missing && apt upgrade -y \
&& apt install -y build-essential cargo curl ffmpeg git libsm6 libxext6 rustc software-properties-common unzip \
&& apt clean \
&& rm -rf /var/lib/apt/lists/*
# Setup Python using the 'ppa:deadsnakes/ppa' apt repository
# NOTE: ARG variables are cleared after each FROM statement. A simple workaround
# to this is to re-declare the ARG without a default value:
# https://github.com/moby/moby/issues/34129#issuecomment-315852422
ARG PY_VERSION
RUN add-apt-repository ppa:deadsnakes/ppa \
&& apt update \
&& apt install -y python$PY_VERSION-dev python$PY_VERSION-distutils \
&& update-alternatives --install /usr/bin/python python /usr/bin/python$PY_VERSION 10 \
&& curl -sS https://bootstrap.pypa.io/get-pip.py | python \
&& apt clean \
&& rm -rf /var/lib/apt/lists/*
# Create the working directory for our code
RUN mkdir /app
WORKDIR /app
# Install dependencies
# - Bug in 'setuptools==66.0.0' with running 'python setup.py develop'
# - 'tokenizers' gives a version error with 'packaging>21.3'
# - Be sure to install PyTorch with CUDA==11.6, so extensions build for GroundingDINO
COPY requirements.txt .
RUN pip install -r requirements.txt \
--no-cache-dir \
--index-url https://download.pytorch.org/whl/cu116 \
--extra-index-url https://pypi.org/simple
# Install GroundingDINO
# NOTE: Use latest commit hash as of 2023-03-27
RUN git clone https://git@github.com/IDEA-Research/GroundingDINO \
&& cd GroundingDINO \
&& git checkout 858efccbad1aed50644f0185e49f4254a9af7560 \
&& python setup.py develop
where my requirements.txt
file looks like this. (Still need to pin other package versions.)
addict
matplotlib
ninja
opencv-python
# 'tokenizers' gives a version error with 'packaging>21.3'
packaging==21.3
pycocotools
# Bug in 'setuptools==66.0.0' with running 'python setup.py develop'
setuptools==65.3.0
timm
# NOTE: Must include '--index-url https://download.pytorch.org/whl/cu116' when installing these PyTorch versions.
torch==1.12.1+cu116
torchvision==0.13.1+cu116
torchaudio==0.12.1
transformers
yapf
Hey I want to infer on my laptop i've totally installed repository however when i tried to infer with inference_on_a_image.py i took same error. How can i solve that
Hey I want to infer on my laptop i've totally installed repository however when i tried to infer with inference_on_a_image.py i took same error. How can i solve that
Perhaps it's caused by the lack of Rust-related environment installed on your system.
I cannot install GroundingDINO. Can you help me?
I installed Rust but this issue still exits... and I do not use docker :( Can anyone help me? please...
I‘ve solved this issue, but for me, it's nothing to do with Rust. Simply python3.7 does not work, and I fogot to change it to python3.8...
I encountered the same problem and solved the problem by looking at this issue!
The key is setting up CUDA_HOME correctly before installing repo~ here is an example:
export CUDA_HOME=/usr/local/cuda-11.6/
pip install -e .
If the GroundingDINO/groundingdino/_C.cpython-38-x86_64-linux-gnu.so file is generated, it indicates success.
If you still encounter problems, please refer to the issue under another project : https://github.com/IDEA-Research / Grounded-Segment-Anything ,for a more detailed explanation.
Hi, We are facing the same error deploying Dino to azure ml. The build pipeline uses cpu machines. Hence entire setup.py is run on a cpu box. First we had the failure of import torch in setup.py. This we got around by adding torch to conda.yml used by azure ml build pipeline. But this pulls in a cpu version of torch. The result is that at runtime when code is deployed to a gpu machine in azure, it complains about torch not being compiled for gpu.
Q: 1) is there a way we can simplify setup.py to not do dynamic compilation of the cpp files. Is there a solution where we can use precompiled _C.* files instead of creating on the fly? That would solve a whole lot of problems 2) the deformable attention seems to use interop to cpp. Is there any way to do this purely within pytorch and avoid issue 1.
@xu5zhao and all Try this easy and quick inference on Google Colab. GroundingDINO-Inference
A quick fix is to just use the pytorch implementation of deformable attention.
Change this line: https://github.com/IDEA-Research/GroundingDINO/blob/654f5e8bf97dce87da7e84e0d3feeb5bbad95388/groundingdino/models/GroundingDINO/ms_deform_attn.py#L330
to:
if not torch.cuda.is_available():
I don't know how much performance hit this will cause but since we are infering on single images, probably negligible.
Also make sure to send the model to cuda after you load it:
model = load_model("groundingdino/config/GroundingDINO_SwinT_OGC.py", "weights/groundingdino_swint_ogc.pth")
model = model.to('cuda:0')
@mvoodarla How was this issue eventually resolved? I am also facing the same issue. Could you please advise?
@fkodom
RUN git clone https://git@github.com/IDEA-Research/GroundingDINO \ && cd GroundingDINO \ && git checkout 858efccbad1aed50644f0185e49f4254a9af7560 \ && python setup.py develop
Have you made any changes to the contents of the GroundingDINO's requirements.txt?
I simply did as @TemugeB suggested. Updated setup.py and removed the extension compilation and all torch references, also edited the ms_deform_attn.py to remove reference to the compiled _C function. It works, did not see any drop in inference quality, though my testing is limited.
thank you!! I'll try it.
First, let me provide the solution: Add the TORCH_CUDA_ARCH_LIST environmental variable declaration in the Dockerfile, like this:
RUN git clone --depth=1 https://github.com/IDEA-Research/GroundingDINO.git
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX;8.9;9.0"
RUN cd GroundingDINO/ && python -m pip install .
By doing this, the compiled GroundingDINO will not face the issue of "cannot import name '_C' from 'groundingdino'".
Now let me describe the process of finding this solution:https://github.com/IDEA-Research/GroundingDINO/issues/8#issuecomment-1485449302
Firstly, I want to thank @fkodom. Although his reply didn't solve my problem directly, it was very enlightening and made me realize that the issue might originate from the setup.py file.
By looking through the source code: https://github.com/IDEA-Research/GroundingDINO/blob/main/setup.py
I noticed a judgment statement in the script: if CUDA_HOME is not None and (torch.cuda.is_available() or "TORCH_CUDA_ARCH_LIST" in os.environ):
During the process of building the Docker image, you cannot use the GPU, because this process runs in the Docker daemon, which is isolated from your host machine and possible GPU hardware. Thus, when executing torch.cuda.is_available()
, it will always return False. Relying solely on this condition, it can't perform the WITH_CUDA compilation, leading to the "cannot import name '_C' from 'groundingdino'" issue.
Fortunately, there's another condition: "TORCH_CUDA_ARCH_LIST" in os.environ
. If we set the TORCH_CUDA_ARCH_LIST environment variable in the Dockerfile before executing setup.py, this problem can be solved. This environment variable specifies for which GPU architectures PyTorch should compile CUDA code. By setting this environment variable, we can clearly instruct PyTorch about the GPU architectures we want to compile CUDA code for.
For instance, we can add the following line in the Dockerfile:
ENV TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX;8.9;9.0"
This would instruct PyTorch to compile CUDA code for the 6.0, 6.1, 7.0, 7.5, 8.0, and 8.6 architectures. These architectures cover a majority of common GPU device models, ensuring that our code will run on most GPU devices.
If you want to specify more precise definitions for your GPU model, you can refer to the official list from Nvidia: https://developer.nvidia.com/cuda-gpus
For example, all cards in the RTX 30 series are 8.6, while the RTX 40 series is 8.9.
If you are confused about the definition of GPU Compute Capability, you can get a better understanding by reading this article: https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
Hi @fkodom I tried your Dockerfile It errored on my side when running ARG PY_VERSION="3.10"
E: Unable to locate package python-distutils
The command '/bin/sh -c add-apt-repository ppa:deadsnakes/ppa && apt update && apt install -y python$PY_VERSION-dev python$PY_VERSION-distutils && update-alternatives --install /usr/bin/python python /usr/bin/python$PY_VERSION 10 && curl -sS https://bootstrap.pypa.io/get-pip.py | python && apt clean && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100
Docker build failed with error: Command 'docker build -t daat-backend-model:1.0.0 ..' returned non-zero exit status 100.
Do you have any idea how to solve it? I appreciate it
I simply did as @TemugeB suggested. Updated setup.py and removed the extension compilation and all torch references, also edited the ms_deform_attn.py to remove reference to the compiled _C function. It works, did not see any drop in inference quality, though my testing is limited.
How You did the setup.py updated and which extension You removed? @darshats
I encountered the same problem and solved the problem by looking at this issue!
The key is setting up CUDA_HOME correctly before installing repo~ here is an example:
export CUDA_HOME=/usr/local/cuda-11.6/ pip install -e .
If the GroundingDINO/groundingdino/_C.cpython-38-x86_64-linux-gnu.so file is generated, it indicates success.
If you still encounter problems, please refer to the issue under another project : https://github.com/IDEA-Research / Grounded-Segment-Anything ,for a more detailed explanation.
thanks for your sharing on this question, however, i have tried this solution and the problem is still existence. Would you please share the 'GroundingDINO/groundingdino/_C.cpython-38-x86_64-linux-gnu.so' file to me? this is my email:ycx971024@163.com
In my case the problem was caused by ABI incompatibility. System GCC was used at building time, but python binary comes from conda. Installing gcc/gxx from conda solves the issue:
conda install gcc_linux-64
conda install gxx_linux-64
For me, only the combination of the exported env and proper torch \ torchvision versions solved the problem!
export CUDA_HOME=/usr/local/cuda && export TORCH_CUDA_ARCH_LIST="6.0;6.1;7.0;7.5;8.0;8.6+PTX;8.9;9.0" && git clone https://github.com/IDEA-Research/GroundingDINO.git && cd GroundingDINO && pip install -e .
and requirements.txt:
torch==2.0.1
torchvision==0.15.2 # <--- did not work until I set the version to be 0.15.2
I was able to build docker with CUDA using the following image @mvoodarla https://github.com/IDEA-Research/GroundingDINO/pull/307
I'm able to run it properly on a local GPU machine I've got but when I move this to the cloud on a docker image, I'm getting this issue. Have a feeling there's some system package I'm missing or something but not 100% sure.