aim-uofa / AdelaiDepth

This repo contains the projects: 'Virtual Normal', 'DiverseDepth', and '3D Scene Shape'. They aim to solve the monocular depth estimation, 3D scene reconstruction from single image problems.
Creative Commons Zero v1.0 Universal
1.06k stars 144 forks source link

Some questions about train data #51

Closed erzhu222 closed 2 years ago

erzhu222 commented 2 years ago

Thanks for your great work, I have some questions about the train dataset:

  1. what dataset did you used to train the pretained model which you provided, are DIML and 3D Ken Burns which in your paper used?
  2. In the dataset you provide on github, I didn't see DIML and 3D Ken Burns dataset, are they in DiverseDepth or they are not provided? If so, how should I prepare these two datasets?
  3. In your paper, relative depth of Holopix50K are generated using FlowNet, how can I generate relative depth of my own stereo data?
guangkaixu commented 2 years ago

Hi @erzhu222 . Thank you for your following!

  1. To my understand, you want to know about the "pre-trained model" loaded before training. The pre-trained model uses the imagenet pre-trained model, as other works did. The models we provide in README.md (res50 and resnext101) are used for inference only, which have been trained following our paper.
  2. Thanks for reminding me. The DIML dataset can be downloaded in the DiverseDepth project, which can be used directly. We can not find the 3D Ken Burns dataset, but it is not so important during training.
  3. You can generate the optical flow between stereo images, and take the horizontal optical flow as the disparity value after filtering out the large portrait optical flow(e.g. bigger than 2 pixels). Note that the generated disparity value is up to scale and may contain noise, so it belongs to the low-quality dataset, and only the ranking loss is used during training.
erzhu222 commented 2 years ago

Thanks for your reply!

  1. For the first question, I mean the weights you provided (not the weights of backbone), what dataset are they(resnet50 and resnext101) trained on ?
  2. I see the DIML dataset in the DiverseDepth project, thanks for your sharing.
  3. I have calculated optical flow with Flownet2 and get the horizontal optical flow, how can I filter the large portrait optical flow(e.g. bigger than 2 pixels) thanks again!
guangkaixu commented 2 years ago

For Q1, we trained the models on Taskonomy(part of it), DIML, 3D Ken Burns, Holopix50K and HRWSI, which have been released except for the 3D Ken Burns.

For Q3, you can generate a valid_mask with portrait optical flow smaller than 2 pixels. Then, set the disparity value of invalid mask (~valid_mask) to 0, during which "0" stands for invalid values or regions.

erzhu222 commented 2 years ago

Thanks,for Q1, DiverseDepth was not used for train?

guangkaixu commented 2 years ago

I double-checked the paper, and DiverseDepth is not employed for training. But more datasets can bring more accuracy and robustness. Just train it with more data as much as possible.

By the way, if you would like to train on large diverse datasets, you may be interested in our BoostingDepth, whose code and data will be released after accepted.

erzhu222 commented 2 years ago

I double-checked the paper, and DiverseDepth is not employed for training. But more datasets can bring more accuracy and robustness. Just train it with more data as much as possible.

By the way, if you would like to train on large diverse datasets, you may be interested in our BoostingDepth, whose code and data will be released after accepted.

OK,thanks for your confirmation,I have downloaded the 3D Ken Burns dataset, could you please provide the annotation file when you traind. Thanks again!