LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.9k stars 528 forks source link

MacOS support #4

Open ShirAmir opened 9 months ago

ShirAmir commented 9 months ago

Trying to run inference in MacOS yields xformer errors of not supported operations (e.g. smaller, cutlassF, tritonflashattF, etc.) Can you add support to running in MacOS (and preferable on Apple Silicone)?

cavit99 commented 9 months ago

You can't run xformers on apple silicon obv as it needs nvidia card. It'll be ported soon no doubt.

le-wib commented 8 months ago

Oh yes please add MacOs support :)

pzoltowski commented 8 months ago

I managed to run it on my macbook m2 max you just have to load correct device:

    if torch.backends.mps.is_available():
        print("Metal GPU available!")
        os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
        device = torch.device("mps")
    elif torch.cuda.is_available():
        device = torch.device("cuda")
        print("Using CUPA with GPU.")
    else:
        device = torch.device("cpu")
        print("Metal GPU not available. Using CPU.")

then

if args.encoder == 'vits':
        depth_anything = DPT_DINOv2(encoder='vits', features=64, out_channels=[48, 96, 192, 384], localhub=args.localhub).to(device)
    elif 
args.encoder == 'vitb':
        depth_anything = DPT_DINOv2(encoder='vitb', features=128, out_channels=[96, 192, 384, 768], localhub=args.localhub).to(device)
    else:
        depth_anything = DPT_DINOv2(encoder='vitl', features=256, out_channels=[256, 512, 1024, 1024], localhub=args.localhub).to(device)

and lastly

image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0).to(device)

then it runs even on MPS. One operator seems is not supported in MPS ('aten::upsample_bicubic2d.out') - thats why need fallback . You can also load whole repo search for 'bicubic' and change to 'bilinear') to run fully on apple GPU.

I'm getting around 200ms per inference with small model with default cat image and default values so it's probably not as performant as running on CUDA - still this is a great work and really fast!

le-wib commented 8 months ago

Oh nice, how do you ido that ? I assume I have to rewrite lines of codes ?

pzoltowski commented 8 months ago

@le-wib yes but it's very easy and pretty much almost the same as in this PR: https://github.com/LiheYoung/Depth-Anything/pull/12

velaia commented 8 months ago

@pzoltowski Yes, that's it! 👍 For people having the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 permanently set, a simple change of the device code like this in run.py / run_video.py will work:

DEVICE = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

Performance: I've done some comparison regarding performance. On a single short video, M1 Max: CPU: python run_video.py --encoder vits --video-path --outdir . 148.86s user 40.61s system 335% cpu 56.489 total MPS: PYTORCH_ENABLE_MPS_FALLBACK=1 python run_video.py --encoder vits --video-path 9.64s user 2.75s system 102% cpu 12.141 total

It's several times faster plus instead of sending CPU to almost 400% it fully utilise the GPU: image

ShirAmir commented 8 months ago

To disable xformers one needs to set the environment variable
XFORMERS_DISABLED=1 this + the comments above solved things for me

teletower commented 2 months ago

Dear ShirAmir. Does this also work on V2 of Depth Anything? If so - it would be great to sum it up how to make this work on Mac silicon in just a few words please?

velaia commented 2 months ago

Depth Anything V2 had macOS MPS support built in from what I've seen. No special treatment as the hardware acceleration supported by your system gets detected automatically as far as I can tell.

teletower commented 2 months ago

Depth Anything V2 had macOS MPS support built in from what I've seen. No special treatment as the hardware acceleration supported by your system gets detected automatically as far as I can tell.

Ok. I'm trying this. Using this in Nuke, on the devs website it does not mention macOS being supported. But thx for answering - very appreciated.