[Feature request] Is it possible to implement LeRes model for a more detailed depth map?

bugmelone commented 1 year ago

Just found this repo: https://github.com/compphoto/BoostingMonocularDepth And I was very impressed with the level of detail of the LeRes model. We can see that by comparing Midas vs LeRes results in complex scene image: Midas LeRes

Extraltodeus commented 1 year ago

wow I didn't know about that repository. Gotta take a look. Thanks!

Maxxxwell62 commented 1 year ago

Much more accurate than midas, I hope they look more closely at this one.

bugmelone commented 1 year ago

Also you might want to take a look at this neighboring repo: https://github.com/compphoto/BoostYourOwnDepth it looks like we can combine results from Midas and LeRes

Maxxxwell62 commented 1 year ago

Also you might want to take a look at this neighboring repo: https://github.com/compphoto/BoostYourOwnDepth it looks like we can combine results from Midas and LeRes

Place the pulls request here as well, maybe it will be faster for them to look

https://github.com/TheLastBen/fast-stable-diffusion/ https://github.com/AUTOMATIC1111/stable-diffusion-webui/ https://github.com/compphoto/BoostingMonocularDepth/

AugmentedRealityCat commented 1 year ago

I second this. The depth extraction repo I was working with, which was programmed by @donlinglok and which now works locally on windows actually use that exact system if I'm not mistaken. It's based on a second round of depth-extraction and some combination process after the first pass is completed. The results are really more detailed and they work much better if you want to use your depthmap to create 3d models from your scene.

Have a look at this repo https://github.com/donlinglok/3d-photo-inpainting

And most particularly you may want to compare this part: https://github.com/donlinglok/3d-photo-inpainting/commit/87ffa051e77557de9f8068b4073f492561cf678b

It was a pleasure helping him fix the problems that were preventing this from running on Windows, and I was so proud when we actually got it right (even though @Donlinglok did all the work!). It would be amazing if it could be adapted for this extension as well.

EDIT: This was all based on a request to add this as an extension for Automatic1111 over here - it documents the debugging process of the windows port of this famous 3dboost among other things. https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4268

One more EDIT: LeRes is now fully functional with this version of the Repo : https://github.com/donlinglok/3d-photo-inpainting/tree/LeRes You might need to manually download the LeRes model from this link: https://cloudstor.aarnet.edu.au/plus/s/lTIJF4vrvHCAI31/download And it goes here : /BoostingMonocularDepth/res101.pth

donlinglok commented 1 year ago

I second this. The depth extraction repo I was working with, which was programmed by @donlinglok and which now works locally on windows actually use that exact system if I'm not mistaken. It's based on a second round of depth-extraction and some combination process after the first pass is completed. ... Have a look at this repo https://github.com/donlinglok/3d-photo-inpainting ...

@AugmentedRealityCat WOW, the LeRes result looks good! The vt-vl-lab/3d-photo-inpainting we tried on windows did generate a depth image... but since vt-vl-lab/3d-photo-inpainting is a 2021 project build with MiDas. I am not sure which one is better, maybe i will try on the LeRes later.

AugmentedRealityCat commented 1 year ago

but since it is a 2021 project I am not sure which one is better, maybe i will try on the LeRes later.

Please try it, and let me know if you need my help with anything.

If I understand correctly, that would mean the "Boosting Monocular Depth" we were using was not even the latest version since LeRes was recently added to it in the repo @bugmelone has posted at the top of this thread.

This is very promising! I can't wait to try this.

AugmentedRealityCat commented 1 year ago

It WORKS (EDIT: It reall works now) !

I installed in a brand new folder, but I did reuse my old venv. The only things I had to manually fix was to replace the empty 0.00 KB model in 3d-photo-inpainting\BoostingMonocularDepth\midas\model.pt (just like the last time) and to remove the pictures and the videos that were already in there.

Is there any way to make sure that the new LeRes algorithm has been applied ?

One thing is for sure, is that I got all the files I was getting with the previous version: a .npy file (some python numerical data - probably depth information) and a smaller resolution depthmap in png format (from 3d-photo-inpainting\depth ), a larger full resolution depthmap in png format (from 3d-photo-inpainting\BoostingMonocularDepth\outputs ), a .ply mesh file (from 3d-photo-inpainting\mesh ), and four video files in mp4 format (from 3d-photo-inpainting\KenBurns\Output ).

The other thing it produced is some kind of log in a file called test_opt.txt in 3d-photo-inpainting\BoostingMonocularDepth\pix2pix\checkpoints\void. Here is what it contains in case it could be useful:

  ----------------- Options ---------------
                    Final: True                             [default: False]
                       R0: False                         
                      R20: False                         
             aspect_ratio: 1.0                           
               batch_size: 1                             
          checkpoints_dir: ./pix2pix/checkpoints         
         colorize_results: False                         
                crop_size: 672                           
                 data_dir: inputs/                          [default: None]
                 dataroot: None                          
             dataset_mode: depthmerge                    
                 depthNet: 0                                [default: None]
                direction: AtoB                          
          display_winsize: 256                           
                    epoch: latest                        
                     eval: False                         
            generatevideo: None                          
                  gpu_ids: 0                             
                init_gain: 0.02                          
                init_type: normal                        
                 input_nc: 2                             
                  isTrain: False                            [default: None]
                load_iter: 0                                [default: 0]
                load_size: 672                           
         max_dataset_size: 10000                         
                  max_res: inf                           
                    model: pix2pix4depth                 
               n_layers_D: 3                             
                     name: void                          
                      ndf: 64                            
                     netD: basic                         
                     netG: unet_1024                     
 net_receptive_field_size: None                          
                      ngf: 64                            
               no_dropout: False                         
                  no_flip: False                         
                     norm: none                          
                 num_test: 50                            
              num_threads: 4                             
               output_dir: outputs                          [default: None]
                output_nc: 1                             
        output_resolution: None                          
                    phase: test                          
              pix2pixsize: None                          
               preprocess: resize_and_crop               
                savecrops: None                          
             savewholeest: None                          
           serial_batches: False                         
                   suffix:                               
                  verbose: False                         
----------------- End -------------------

Thanks again for making this happen.

AugmentedRealityCat commented 1 year ago

And here is the depthmap I get from the very latest version - everything works now !

v-sso_fourth (2)

Extraltodeus commented 1 year ago

Impressive @AugmentedRealityCat ! Could you make me a pull request with your modifications?

AugmentedRealityCat commented 1 year ago

Could you make me a pull request with your modifications?

I wish I could, but I am not a programmer, so I have no idea how to do that. The programmer is @donlinglok .

donlinglok commented 1 year ago

@Extraltodeus The video that @AugmentedRealityCat generated is by 3d-photo-inpainting, that project already implemented https://github.com/compphoto/BoostingMonocularDepth, I just did a small change to switch the method form MiDas to LeRes. (which BoostingMonocularDepth supported.)

I think it is easy to implement the BoostingMonocularDepth, you may check on this file boostmonodepth_utils.py line 23-49, it just collects the image and pass to BoostingMonocularDepth via command line, and invert grayscale of the output.

more reference here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4268

bugmelone commented 1 year ago

Looks like @thygate added LeRes to his main extension repo as experimental support. https://github.com/thygate/stable-diffusion-webui-depthmap-script/discussions/34#discussioncomment-4297096 https://github.com/thygate/stable-diffusion-webui-depthmap-script/commit/ee06f979a43708e7cb11ac2ff37c9260f4b204ff

Extraltodeus / depthmap2mask

[Feature request] Is it possible to implement LeRes model for a more detailed depth map? #11