RaymondWang987 / NVDS

ICCV 2023 "Neural Video Depth Stabilizer" (NVDS) & TPAMI 2024 "NVDS+: Towards Efficient and Versatile Neural Stabilizer for Video Depth Estimation" (NVDS+)
MIT License
491 stars 24 forks source link

How to use different MiDaS model? #19

Open vitacon opened 1 year ago

vitacon commented 1 year ago

@RaymondWang987: We have tried NVDS with MiDaS, DPT, MiDaS 3.1, and NewCRFs. The results are quite satisfactory. You can simply change the depth predictor to MiDaS 3.1 (only adjusting one line in our demo code) and our NVDS can produce significant improvement in temporal consistency.

I suppose you were refering to these lines: dpt = MidasNet_large('./dpt/checkpoints/midas_v21-f6b98070.pt', non_negative=True).to(device_flow)

dpt = DPTDepthModel(path='./dpt/checkpoints/dpt_large-midas-2f21e586.pt', etc.

I thought I could simply point it to a different model file (dpt_beit_large_512.pt from MiDaS 3.1) but it crashes so it seems it's not that easy. Would you mind adding some details?

RaymondWang987 commented 1 year ago

@RaymondWang987: We have tried NVDS with MiDaS, DPT, MiDaS 3.1, and NewCRFs. The results are quite satisfactory. You can simply change the depth predictor to MiDaS 3.1 (only adjusting one line in our demo code) and our NVDS can produce significant improvement in temporal consistency.

I suppose you were refering to these lines: dpt = MidasNet_large('./dpt/checkpoints/midas_v21-f6b98070.pt', non_negative=True).to(device_flow)

dpt = DPTDepthModel(path='./dpt/checkpoints/dpt_large-midas-2f21e586.pt', etc.

I thought I could simply point it to a different model file (dpt_beit_large_512.pt from MiDaS 3.1) but it crashes so it seems it's not that easy. Would you mind adding some details?

You should also replace the current dpt folder with the new midas folder from MidasV3.1. The current dpt folder is from MidasV3.0(DPT), so it does not support MidasV3.1. You should contain the new midas folder and import the depth models from those py files. And then you can adjust the one line in the demo code and use MidasV3.1 for inference.

RaymondWang987 commented 1 year ago

@RaymondWang987: We have tried NVDS with MiDaS, DPT, MiDaS 3.1, and NewCRFs. The results are quite satisfactory. You can simply change the depth predictor to MiDaS 3.1 (only adjusting one line in our demo code) and our NVDS can produce significant improvement in temporal consistency.

I suppose you were refering to these lines: dpt = MidasNet_large('./dpt/checkpoints/midas_v21-f6b98070.pt', non_negative=True).to(device_flow)

dpt = DPTDepthModel(path='./dpt/checkpoints/dpt_large-midas-2f21e586.pt', etc.

I thought I could simply point it to a different model file (dpt_beit_large_512.pt from MiDaS 3.1) but it crashes so it seems it's not that easy. Would you mind adding some details?

(1) Contain the new midas folder of MidasV3.1 in your directory. (2) Import the DPTDepthModel from those new MidasV3.1 files. (3) Adjust the one line in the demo code and use MidasV3.1 for inference, e.g., `

model = DPTDepthModel(
            path='xxx/xxx.pt',
            backbone="beitl16_384",
            non_negative=True,
        )
vitacon commented 1 year ago

Thanks, @RaymondWang987. I made a few steps forward. =}

I modified the part with DPTDepthModel:

    dpt = DPTDepthModel(
            path='./dpt/checkpoints/dpt_beit_large_512.pt',
            backbone="beitl16_384",
            non_negative=True,
        )        

Actually, it was not enough because the files from MiDaS 3.1 are slightly different. I had to change two more lines. Original:

from dpt.models import DPTDepthModel
from dpt.midas_net import MidasNet_large

Modified:

from dpt.dpt_depth import DPTDepthModel
from dpt.midas_net import MidasNet

Unfortunately, the test crashes while loading the model:

Traceback (most recent call last):
  File "infer_nvds_dpt_bi-v4.py", line 365, in <module>
    dpt = DPTDepthModel(
  File "C:\Users\Vita\video\nvds\dpt\dpt_depth.py", line 163, in __init__
    self.load(path)
  File "C:\Users\Vita\video\nvds\dpt\base_model.py", line 16, in load
    self.load_state_dict(parameters)
  File "C:\Users\Vita\Anaconda3\envs\NVDS\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for DPTDepthModel:
        Unexpected key(s) in state_dict: "pretrained.model.blocks.0.attn.relative_position_index", 

It seems to me that MiDaS 3.1 requires a different version of torch than NVDS...?

RaymondWang987 commented 1 year ago

In my case (Linux server), I just use:

torch.__version__
'1.9.0+cu111'

And that works well for different MidasV3.1 models. I set a parameter to choose different initial depth predictors and the inference can run successfully with all those predictors:

from MiDaS.midas.dpt_depth import DPTDepthModel
# Import the DPTDepthModel from Midas3.1 

if args.initial_type == 'dptlarge':
        dpt = DPTDepthModel(
            path='/xxx/dpt_large-midas-2f21e586.pt',
            backbone="vitl16_384",
            non_negative=True,
            enable_attention_hooks=False,
            ).to(device_flow)
    elif args.initial_type == 'dpt_beit_large_384':
        dpt = DPTDepthModel(
            path='/xxx/dpt_beit_large_384.pt',
            backbone="beitl16_384",
            non_negative=True,
            enable_attention_hooks=False,
            ).to(device_flow)
    elif args.initial_type == 'dpt_swin2_large_384':
        dpt = DPTDepthModel(
            path='/xxx/dpt_swin2_large_384.pt',
            backbone="swin2l24_384",
            non_negative=True,
            enable_attention_hooks=False,
            ).to(device_flow)
    elif args.initial_type == 'dpt_swin2_tiny_256':
       dpt = DPTDepthModel(
            path='/xxx/dpt_swin2_tiny_256.pt',
            backbone="swin2t16_256",
            non_negative=True,
        ).to(device_flow)
dpt.eval()

I do not make other adjustments on the code. I guess your problem is still caused by the Windows environment. But i'm not sure bucause I have not installed and run NVDS on Windows.

Also notice that most dpt models use 0.5 as input normalizations while Midas-Large use __mean_dpt=[0.485, 0.456, 0.406] and __std_dpt=[0.229, 0.224, 0.225]. The normalization coefficients should be noticed and changed for different depth predictors following their official settings.

vitacon commented 1 year ago

Thanks for mentioning the normalization. Hopefully, I will run into it later.

It took me a while to notice that I am using dpt_beit_large_512 and you are mentioning dpt_beit_large_384.

I thought that this was the root of the error but changing backbone="beitl16_384" to backbone="beitl16_512" did not help. Then I suspected that there was something wrong with the "512" model so I downloaded 384 too but it crashes the same way. I'll have to focus on versions of installed libraries (MiDaS 3.1 versus NVDS) but it will require some more investigation...

vitacon commented 1 year ago

The good thing is that it is already a known MiDaS issue related to timm: https://github.com/isl-org/MiDaS/issues/245

I had previously installed the latest version of timm, so now I had to remove it and it broke some other stuff.

pip install  timm==0.6.12 
pip install imutils==0.5.4 
pip install pillow==10.0.0 

Now it finally runs with dpt_beit_large_512.pt. Thanks again! =)

RaymondWang987 commented 1 year ago

The good thing is that it is already a known MiDaS issue related to timm: isl-org/MiDaS#245

I had previously installed the latest version of timm, so now I had to remove it and it broke some other stuff.

pip install  timm==0.6.12 
pip install imutils==0.5.4 
pip install pillow==10.0.0 

Now it finally runs with dpt_beit_large_512.pt. Thanks again! =)

Glad to hear that good news.