Memory-efficient attention forward operator error in Docker container with FastAPI and DINOv2

mdelhaous commented 4 months ago

I'm currently working on an API using FastAPI to serve DINOv2 models from the official DINOv2 repository. The API works well locally, but when I run it in a Docker container, I encounter an error related to the memory-efficient attention forward operator.

API.py

`from fastapi import FastAPI, File, UploadFile, HTTPException, Form from fastapi.responses import JSONResponse import torch from PIL import Image, UnidentifiedImageError from torchvision import transforms import io import requests # Import the requests library import logging

logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name)

app = FastAPI()

MODEL_MAP = { 'dinov2_vitl14': 'dinov2_vitl14', 'dinov2_vits14': 'dinov2_vits14', 'dinov2_vitb14': 'dinov2_vitb14', 'dinov2_vitg14': 'dinov2_vitg14', }

def load_model(model_name: str): if model_name not in MODEL_MAP: raise ValueError(f"Model {model_name} is not supported.") model = torch.hub.load('facebookresearch/dinov2', MODEL_MAP[model_name]) model.eval() return model

def preprocess_image(image): input_image = Image.open(io.BytesIO(image)).convert('RGB') preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # Create a mini-batch as expected by the model return input_batch

def infer(model, input_batch): with torch.no_grad(): output = model(input_batch) return output

@app.post("/infer/") async def infer_image(file: UploadFile = File(...), model_name: str = Form(...)): try: if model_name not in MODEL_MAP: raise HTTPException(status_code=400, detail="Invalid model name provided.")

    image_bytes = await file.read()
    input_batch = preprocess_image(image_bytes)
    model = load_model(model_name)
    output = infer(model, input_batch)

    return JSONResponse(content={"inference_output": output.tolist()})
except Exception as e:
    logger.error(f"Error processing image: {str(e)}")
    raise HTTPException(status_code=500, detail=str(e))

@app.post("/infer-url/") async def infer_image_url(url: str = Form(...), model_name: str = Form(...)): try: if model_name not in MODEL_MAP: raise HTTPException(status_code=400, detail="Invalid model name provided.")

    response = requests.get(url)
    if response.status_code != 200:
        logger.error(f"Failed to download image from URL: {url}, Status code: {response.status_code}")
        raise HTTPException(status_code=400, detail="Failed to download image from the provided URL.")

    image_bytes = response.content

    try:
        input_batch = preprocess_image(image_bytes)
    except UnidentifiedImageError:
        logger.error(f"The provided URL does not contain a valid image: {url}")
        raise HTTPException(status_code=400, detail="The provided URL does not contain a valid image.")

    model = load_model(model_name)
    output = infer(model, input_batch)

    return JSONResponse(content={"inference_output": output.tolist()})
except Exception as e:
    logger.error(f"Error processing image from URL: {url}, Error: {str(e)}")
    raise HTTPException(status_code=500, detail=str(e))

if name == "main": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8000)`

Dockerfile

` FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY . . EXPOSE 8000 CMD ["uvicorn", "API:app", "--host", "0.0.0.0", "--port", 8000"]

`

requirements.txt: ` --extra-index-url https://download.pytorch.org/whl/cu117

torch==2.0.0

torchvision==0.15.0

omegaconf

torchmetrics==0.10.3

fvcore

iopath

xformers==0.0.18

submitit

--extra-index-url https://pypi.nvidia.com

cuml-cu11

fastapi

Pillow

requests

uvicorn `

Problem: When I run the API locally, both endpoints work fine. However, when running the API in the Docker container, I get the following error:

{ "detail": "No operator found for memory_efficient_attention_forward with inputs:\n query : shape=(1, 257, 6, 64) (torch.float32)\n key : shape=(1, 257, 6, 64) (torch.float32)\n value : shape=(1, 257, 6, 64) (torch.float32)\n attn_bias : <class 'NoneType'>\n p : 0.0\ncutlassF is not supported because:\n device=cpu (supported: {'cuda'})\nflshattF is not supported because:\n device=cpu (supported: {'cuda'})\n dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})\ntritonflashattFis not supported because:\n device=cpu (supported: {'cuda'})\n dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})\n Operator wasn't built - seepython -m xformers.info for more info\n triton is not available\nsmallkFis not supported because:\n max(query.shape[-1] != value.shape[-1]) > 32\n unsupported embed per head: 64" }

Additional Information: The error seems to indicate that the CPU is not supported for the memory-efficient attention forward operator. Locally, the API runs on a system with CUDA support. In the Docker container, the application seems to be running on the CPU. Question: How can I resolve this error and make the API work correctly in the Docker container?

NB: I DONT HAVE NVIDIA GPU ON MY MACHINE. Any insights or suggestions would be greatly appreciated. Thank you!

heyoeyo commented 4 months ago

This is likely caused by using xformers operations while running on cpu, which generally doesn't work. If you know you're going to be running on a cpu-only system, it might be best to avoid installing xformers altogether, since it seems (in my experience) to go out and pull other gpu requirements.

If you can't uninstall xformers, or just want to leave it in to support cuda systems where possible, it looks like the dinov2 model includes support for disabling xformers by setting an environment variable: XFORMERS_DISABLED

mdelhaous commented 4 months ago

This is likely caused by using xformers operations while running on cpu, which generally doesn't work. If you know you're going to be running on a cpu-only system, it might be best to avoid installing xformers altogether, since it seems (in my experience) to go out and pull other gpu requirements.

If you can't uninstall xformers, or just want to leave it in to support cuda systems where possible, it looks like the dinov2 model includes support for disabling xformers by setting an environment variable: XFORMERS_DISABLED

but why the API works locally and not in the container (I use the same requirements on my local)

heyoeyo commented 4 months ago

Locally, the API runs on a system with CUDA support

If when running outside of docker there is cuda support, then I think xformers will use that without any issues. If on that same machine it doesn't work inside docker, then it's likely an issue with the docker image not having the right cuda dependencies or not having gpu passthrough working (both of which are a pain to debug in my experience).

mdelhaous commented 4 months ago

The issue is resolved! I added the following line to my Dockerfile: ENV XFORMERS_DISABLED=1

Thank you so much @heyoeyo

mdelhaous commented 4 months ago

Hello @heyoeyo,

Hope you are doing well

Issues:

AWS Deployment Error:

When running the Docker container on an AWS machine, I encounter the following error: WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested exec /opt/nvidia/nvidia_entrypoint.sh: exec format error
Mac Build Error:

When building the Docker image on a Mac, I encounter the following error: ERROR: Cannot install torch==2.0.1 and torchvision==0.15.1 because these package versions have conflicting dependencies. ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Goal:

I aim to create multi-architecture Docker builds to ensure compatibility across all platforms (e.g., AMD64, ARM64). I'm not entirely sure about Mac support for DINOv2 yet, but I'm looking for a viable solution.

Attempts:

AWS: Tried specifying the platform in the Docker run command, but it didn't resolve the issue.
Mac: Attempted to align dependency versions, but the conflict persists.

Questions:

How can I resolve the exec format error on AWS?
How can I manage the dependency conflict for torch and torchvision during the Docker build on Mac?
What are the best practices for creating multi-architecture Docker builds for such projects?

Thank you in advance for your help!

heyoeyo commented 4 months ago

What are the best practices for creating multi-architecture Docker builds for such projects?

I've only ever worked with x86, so I'm not familiar with ARM devices/deployments or how best to handle multi-architecture builds unfortunately, sorry!

How can I resolve the exec format error on AWS?

As far as I can tell, the error is just saying that the docker image is for x86, but it's trying to run on an ARM device. It looks like some images on docker hub have both x86 & ARM versions, but the base image you're using may not have this (at a quick glance, the pytorch base images only seem to be available in x86, for example), so that may be the issue. If possible, switching to a base image that is already built with ARM support may resolve the issue.

How can I manage the dependency conflict for torch and torchvision during the Docker build on Mac?

The simplest thing to try would be to loosen the strictness of the requirements, for example by using torch==2.0.* and torchvision==0.15.*, so that more versions are allowed when resolving the dependencies.

If you want broader support, what I would probably do is first install without version restrictions to see what version gets installed (using pip list) and treat that as an upper bound. Let's say that gives you torch 2.3.1 and torchvision 0.18.1. Then re-install, but use restrictions like: torch<2.3 and torchvision<0.18, and see what that installs. You can keep trying to reduce the versioning until your scripts no longer work. That should give a lower/upper bound on versions that you know will work, which you can then use for your requirements to allow them to be more flexible. I've done something like this in this requirements.txt file for example.

mdelhaous commented 4 months ago

Hello @heyoeyo,

First of all Thank you so much for you responses!

I'm encountering a dependency conflict when building a Docker image for the linux/arm64 platform. The build fails with the following error, but building for the default platform works fine.

Issue: The build for the linux/arm64 platform fails due to version conflicts between torch, torchvision, torchmetrics, and xformers. However, building for the default platform works without issues. Questions:

How can I resolve these dependency conflicts when building for the linux/arm64 platform?
Are there specific versions of these packages that are compatible with linux/arm64?

Any insights or suggestions would be greatly appreciated!

Working Build Command: docker build -t myfastapiapp .

Failing Build Command: docker build --platform linux/arm64 -t myfastapiapp-arm64 .

Error Message (for linux/arm64):

`ERROR: Cannot install -r requirements.txt (line 3), -r requirements.txt (line 5), -r requirements.txt (line 8) and torch==2.0.0 because these package versions have conflicting dependencies.

The conflict is caused by: The user requested torch==2.0.0 torchvision 0.15.0 depends on torch torchmetrics 0.10.3 depends on torch>=1.3.1 xformers 0.0.27 depends on torch>=2.2

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip to attempt to solve the dependency conflict `

heyoeyo commented 4 months ago

Based on the error:

xformers 0.0.27 depends on torch>=2.2

It looks like having torch 2.0.0 and xformers 0.0.27 isn't possible.

xFormers is very specific to use with cuda devices, which I'm not sure even work with ARM...? So one easy fix may just be to remove the xformers altogether for the ARM build (dinov2 will work without it).

Alternatively, I think you can either downgrade the xformers version to a lower version that works with torch 2.0.0, or upgrade torch to >2.2. For example, on my local copy of dinov2 I have torch 2.0.0 with xformers 0.0.18, so xformers==0.0.18 may fix the problem. Though assuming dinov2 still works with newer pytorch versions (i.e. >2.2), it's probably better to upgrade the torch requirement, just for better longevity.

mdelhaous commented 3 months ago

Thank you soo much @heyoeyo

facebookresearch / dinov2

Memory-efficient attention forward operator error in Docker container with FastAPI and DINOv2 #446