YoYo000 / MVSNet

MVSNet (ECCV2018) & R-MVSNet (CVPR2019)
MIT License
1.39k stars 303 forks source link

why the pairs.txt and cams.txt for different scans are the same? #85

Open chaytonmin opened 4 years ago

chaytonmin commented 4 years ago

I find the pair.txt and cams.txt for all the scans are the same. However, they should be different.
If I want to use my own dataset, how to change to it to fit the mvsnet for training? Thanks!

kwea123 commented 4 years ago

Of course they are the same and must be the same. They specify the camera instrinsics (focal length, etc) and extrinsics (where it is placed in world coordinate). The cameras are all fixed once the calibration is done, the relative pose mustn't change for different scans, they just replace the object without touching the cameras. If you use your own dataset, you need to calibrate the cameras first (know the intrinsics and extrinsics) and do not touch cameras anymore. pair.txt is just some similarity measure of cameras, you can just set cameras whose position are close together.

chaytonmin commented 4 years ago

Of course they are the same and must be the same. They specify the camera instrinsics (focal length, etc) and extrinsics (where it is placed in world coordinate). The cameras are all fixed once the calibration is done, the relative pose mustn't change for different scans, they just replace the object without touching the cameras. If you use your own dataset, you need to calibrate the cameras first (know the intrinsics and extrinsics) and do not touch cameras anymore. pair.txt is just some similarity measure of cameras, you can just set cameras whose position are close together.

Thanks a lot!

I have another question for the testing part for RMSNet. Why we need to scale the cams at 0.8 for RMVSNet while for MVSNet 1.06? ''' Run MVSNet (GTX1080Ti): python test.py --dense_folder TEST_DATA_FOLDER --regularization '3DCNNs' --max_w 1152 --max_h 864 --max_d 192 --interval_scale 1.06 Run R-MVSNet (GTX1080Ti): python test.py --dense_folder TEST_DATA_FOLDER --regularization 'GRU' --max_w 1600 --max_h 1200 --max_d 256 --interval_scale 0.8 ''' And why we need to scale the cams first when testing the models?

kwea123 commented 4 years ago

It's all because the model consumes too much GPU memory:

  1. MVSNet has max_d=192 and interval_scale=1.06 so that it can cover depth range 425 to 425+191x2.5x1.06=931.15. The max_d is only 192 because if we set to 256 it will probably cause out of memory; so to cover the max depth up to 935mm, we need to set larger interval scale. On the other hand, R-MVSNet is designed to use less GPU memory, so we can use larger max_d without causing OOM. In short, due to memory constraint, we decide the max_d each model can have, then calculate interval_scale so that both model can cover up to 935mm (For RMVSNet, it's 425+255x2.5x0.8=935).
  2. It is not "scaling the cams", the number is for the interval between each depth prediction, in other words precision: for example interval_scale=1.06 means your model can be precise up to 2.5x1.06=2.65mm, but interval_scale=0.8 means it can be precise up to 2.5x0.8=2mm. Note that it's just an example, in reality the model use soft-argmin instead of argmin so it will be more precise than these numbers (see paper), but it is still correct that smaller interval_scale means better accuracy.
chaytonmin commented 4 years ago

It's all because the model consumes too much GPU memory:

  1. MVSNet has max_d=192 and interval_scale=1.06 so that it can cover depth range 425 to 425+191x2.5x1.06=931.15. The max_d is only 192 because if we set to 256 it will probably cause out of memory; so to cover the max depth up to 935mm, we need to set larger interval scale. On the other hand, R-MVSNet is designed to use less GPU memory, so we can use larger max_d without causing OOM. In short, due to memory constraint, we decide the max_d each model can have, then calculate interval_scale so that both model can cover up to 935mm (For RMVSNet, it's 425+255x2.5x0.8=935).
  2. It is not "scaling the cams", the number is for the interval between each depth prediction, in other words precision: for example interval_scale=1.06 means your model can be precise up to 2.5x1.06=2.65mm, but interval_scale=0.8 means it can be precise up to 2.5x0.8=2mm. Note that it's just an example, in reality the model use soft-argmin instead of argmin so it will be more precise than these numbers (see paper), but it is still correct that smaller interval_scale means better accuracy.

Got it!

YoYo000 commented 4 years ago

@kwea123 Thank you for your assistance!

chaytonmin commented 4 years ago

Why in MVSNet: the argmax operation is unable to produce sub-pixel estimation, and cannot be trained with back-propagation due to its indifferentiability, but in R-MVSNet: Rather than treat the problemas a regression task, we train the network as a multi-class classification problem with cross entropy loss?

YoYo000 commented 4 years ago

In R-MVSNet we use inverse depth for large depth range reconstructions. The definition of the expectation (soft-argmin) of inverse depth is not intuitive so we did not apply this operation at that time.

However, other works (e.g., https://arxiv.org/pdf/1912.11746.pdf) have shown that inverse depth+soft-argmin could also produce good or even better result.