alembics / disco-diffusion

Other
7.47k stars 1.13k forks source link

New issue with midas function section #164

Open GwendalC opened 1 year ago

GwendalC commented 1 year ago

Hello, could you help me solve this issue? I restarted the notebook several times, it was working fine up to 1 PM today A dependency issue? thanks ! (you're awesome)

in 1 #@title ### 1.4 Define Midas functions 2 ----> 3 from midas.dpt_depth import DPTDepthModel 4 from midas.midas_net import MidasNet 5 from midas.midas_net_custom import MidasNet_small

2 frames

/content/MiDaS/midas/backbones/next_vit.py in 6 from .utils import activations, forward_default, get_activation 7 ----> 8 file = open("./externals/Next_ViT/classification/nextvit.py", "r") 9 source_code = file.read().replace(" utils", " externals.Next_ViT.classification.utils") 10 exec(source_code)

FileNotFoundError: [Errno 2] No such file or directory: './externals/Next_ViT/classification/nextvit.py'

GwendalC commented 1 year ago

It seems there has been an update of midas today.

[Dec 2022] Released MiDaS v3.1: New models based on 5 different types of transformers (BEiT, Swin2, Swin, Next-ViT, LeViT) Training datasets extended from 10 to 12, including also KITTI and NYU Depth V2 using BTS split Best model, BEiTLarge 512, with resolution 512x512, is on average about 28% more accurate than MiDaS v3.0 Integrated live depth estimation from camera feed

GwendalC commented 1 year ago

I guess the fix should be something like referring to version 3.0 in the gitclone here in disco.py

580 | try: 581 | from midas.dpt_depth import DPTDepthModel 582 | except: 583 | if not os.path.exists('MiDaS'): 584 | gitclone("https://github.com/isl-org/MiDaS.git") 585 | if not os.path.exists('MiDaS/midas_utils.py'):

jszgz commented 1 year ago

same here

GwendalC commented 1 year ago

Cause here: https://github.com/isl-org/MiDaS/issues/193

xirtus commented 1 year ago

worst crisis of my life

StMoelter commented 1 year ago

As long as the fix i not merged, the branch with the fix can be used: https://colab.research.google.com/github/StMoelter/disco-diffusion/blob/fix%2Fmidas-checkout-v3-tag/Disco_Diffusion.ipynb

thias15 commented 1 year ago

Hi guys. MiDaS v3.1 is now fixed to make NextViT which was causing the issue optional. So you can use tag v3_1 and also use the latest models with even better performance. For instance you could point to tag v3_1, download the checkpoint from the corresponding release and then define for example the BEiT_L_384 like so:

    if midas_model_type == "beit_l_384":  # BEiT_L_384
        midas_model = DPTDepthModel(
            path=midas_model_path,
            backbone="beitl16_384",
            non_negative=True,
        )
        net_w, net_h = 384, 384
        resize_mode = "minimal"
        normalization = NormalizeImage(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
aletts commented 1 year ago

@thias15 Thanks. It could be an interesting thing to try.

However, there's a funny thing about the depth estimation in Disco Diffusion. In many cases, more accurate depth estimation may result in aesthetically worse results.

When I initially tried using MiDaS dpt_large (from v3) alone, I found that in combination with the flow field technique used for the transformation to the next frame, the MiDaS dpt_large depth estimation was already too good. Sharp/defined edges in common content result in exposing undesirable properties of the simple flow field approach (I also had an experimental better technique prior to the DD v5 release and didn't initially include it since it was complicated and I thought people wouldn't know why I'd done it.. and then I lost the code and haven't prioritized doing it again). I quickly improved its aesthetics by introducing a weighted blend from the AdaBins output. I suspect that with the flow field approach unchanged, most would get better results by increasing the AdaBins contribution further.

thias15 commented 1 year ago

@aletts interesting! Note that we have introduced several new models in release v3.1 leveraging different backbones with various trade-offs between accuracy and speed, e.g. Swin-L, SwinV2-T, SwinV2-B, SwinV2-L, LeViT, BEiT-L, etc. Might be interesting to try different variants to see how nicely they play with the flow field approach. By the way, what exactly is the flow field approach used for and how does it work? On a side note, we will also release a new depth estimation model in the near future that essentially combines AdaBins and MiDaS, so stay tuned for that.