Open GwendalC opened 1 year ago
It seems there has been an update of midas today.
[Dec 2022] Released MiDaS v3.1: New models based on 5 different types of transformers (BEiT, Swin2, Swin, Next-ViT, LeViT) Training datasets extended from 10 to 12, including also KITTI and NYU Depth V2 using BTS split Best model, BEiTLarge 512, with resolution 512x512, is on average about 28% more accurate than MiDaS v3.0 Integrated live depth estimation from camera feed
I guess the fix should be something like referring to version 3.0 in the gitclone here in disco.py
580 | try: 581 | from midas.dpt_depth import DPTDepthModel 582 | except: 583 | if not os.path.exists('MiDaS'): 584 | gitclone("https://github.com/isl-org/MiDaS.git") 585 | if not os.path.exists('MiDaS/midas_utils.py'):
same here
Cause here: https://github.com/isl-org/MiDaS/issues/193
worst crisis of my life
As long as the fix i not merged, the branch with the fix can be used: https://colab.research.google.com/github/StMoelter/disco-diffusion/blob/fix%2Fmidas-checkout-v3-tag/Disco_Diffusion.ipynb
Hi guys. MiDaS v3.1 is now fixed to make NextViT which was causing the issue optional. So you can use tag v3_1
and also use the latest models with even better performance. For instance you could point to tag v3_1
, download the checkpoint from the corresponding release and then define for example the BEiT_L_384 like so:
if midas_model_type == "beit_l_384": # BEiT_L_384
midas_model = DPTDepthModel(
path=midas_model_path,
backbone="beitl16_384",
non_negative=True,
)
net_w, net_h = 384, 384
resize_mode = "minimal"
normalization = NormalizeImage(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
@thias15 Thanks. It could be an interesting thing to try.
However, there's a funny thing about the depth estimation in Disco Diffusion. In many cases, more accurate depth estimation may result in aesthetically worse results.
When I initially tried using MiDaS dpt_large (from v3) alone, I found that in combination with the flow field technique used for the transformation to the next frame, the MiDaS dpt_large depth estimation was already too good. Sharp/defined edges in common content result in exposing undesirable properties of the simple flow field approach (I also had an experimental better technique prior to the DD v5 release and didn't initially include it since it was complicated and I thought people wouldn't know why I'd done it.. and then I lost the code and haven't prioritized doing it again). I quickly improved its aesthetics by introducing a weighted blend from the AdaBins output. I suspect that with the flow field approach unchanged, most would get better results by increasing the AdaBins contribution further.
@aletts interesting! Note that we have introduced several new models in release v3.1 leveraging different backbones with various trade-offs between accuracy and speed, e.g. Swin-L, SwinV2-T, SwinV2-B, SwinV2-L, LeViT, BEiT-L, etc. Might be interesting to try different variants to see how nicely they play with the flow field approach. By the way, what exactly is the flow field approach used for and how does it work? On a side note, we will also release a new depth estimation model in the near future that essentially combines AdaBins and MiDaS, so stay tuned for that.
Hello, could you help me solve this issue? I restarted the notebook several times, it was working fine up to 1 PM today A dependency issue? thanks ! (you're awesome)
2 frames
/content/MiDaS/midas/backbones/next_vit.py in
6 from .utils import activations, forward_default, get_activation
7
----> 8 file = open("./externals/Next_ViT/classification/nextvit.py", "r")
9 source_code = file.read().replace(" utils", " externals.Next_ViT.classification.utils")
10 exec(source_code)
FileNotFoundError: [Errno 2] No such file or directory: './externals/Next_ViT/classification/nextvit.py'