Open chaytonmin opened 4 years ago
Of course they are the same and must be the same. They specify the camera instrinsics (focal length, etc) and extrinsics (where it is placed in world coordinate). The cameras are all fixed once the calibration is done, the relative pose mustn't change for different scans, they just replace the object without touching the cameras.
If you use your own dataset, you need to calibrate the cameras first (know the intrinsics and extrinsics) and do not touch cameras anymore. pair.txt
is just some similarity measure of cameras, you can just set cameras whose position are close together.
Of course they are the same and must be the same. They specify the camera instrinsics (focal length, etc) and extrinsics (where it is placed in world coordinate). The cameras are all fixed once the calibration is done, the relative pose mustn't change for different scans, they just replace the object without touching the cameras. If you use your own dataset, you need to calibrate the cameras first (know the intrinsics and extrinsics) and do not touch cameras anymore.
pair.txt
is just some similarity measure of cameras, you can just set cameras whose position are close together.
Thanks a lot!
I have another question for the testing part for RMSNet. Why we need to scale the cams at 0.8 for RMVSNet while for MVSNet 1.06? ''' Run MVSNet (GTX1080Ti): python test.py --dense_folder TEST_DATA_FOLDER --regularization '3DCNNs' --max_w 1152 --max_h 864 --max_d 192 --interval_scale 1.06 Run R-MVSNet (GTX1080Ti): python test.py --dense_folder TEST_DATA_FOLDER --regularization 'GRU' --max_w 1600 --max_h 1200 --max_d 256 --interval_scale 0.8 ''' And why we need to scale the cams first when testing the models?
It's all because the model consumes too much GPU memory:
max_d=192
and interval_scale=1.06
so that it can cover depth range 425 to 425+191x2.5x1.06=931.15. The max_d
is only 192 because if we set to 256 it will probably cause out of memory; so to cover the max depth up to 935mm, we need to set larger interval scale. On the other hand, R-MVSNet is designed to use less GPU memory, so we can use larger max_d
without causing OOM. In short, due to memory constraint, we decide the max_d
each model can have, then calculate interval_scale
so that both model can cover up to 935mm (For RMVSNet, it's 425+255x2.5x0.8=935).interval_scale=1.06
means your model can be precise up to 2.5x1.06=2.65mm, but interval_scale=0.8
means it can be precise up to 2.5x0.8=2mm. Note that it's just an example, in reality the model use soft-argmin instead of argmin so it will be more precise than these numbers (see paper), but it is still correct that smaller interval_scale
means better accuracy.It's all because the model consumes too much GPU memory:
- MVSNet has
max_d=192
andinterval_scale=1.06
so that it can cover depth range 425 to 425+191x2.5x1.06=931.15. Themax_d
is only 192 because if we set to 256 it will probably cause out of memory; so to cover the max depth up to 935mm, we need to set larger interval scale. On the other hand, R-MVSNet is designed to use less GPU memory, so we can use largermax_d
without causing OOM. In short, due to memory constraint, we decide themax_d
each model can have, then calculateinterval_scale
so that both model can cover up to 935mm (For RMVSNet, it's 425+255x2.5x0.8=935).- It is not "scaling the cams", the number is for the interval between each depth prediction, in other words precision: for example
interval_scale=1.06
means your model can be precise up to 2.5x1.06=2.65mm, butinterval_scale=0.8
means it can be precise up to 2.5x0.8=2mm. Note that it's just an example, in reality the model use soft-argmin instead of argmin so it will be more precise than these numbers (see paper), but it is still correct that smallerinterval_scale
means better accuracy.
Got it!
@kwea123 Thank you for your assistance!
Why in MVSNet: the argmax operation is unable to produce sub-pixel estimation, and cannot be trained with back-propagation due to its indifferentiability, but in R-MVSNet: Rather than treat the problemas a regression task, we train the network as a multi-class classification problem with cross entropy loss?
In R-MVSNet we use inverse depth for large depth range reconstructions. The definition of the expectation (soft-argmin) of inverse depth is not intuitive so we did not apply this operation at that time.
However, other works (e.g., https://arxiv.org/pdf/1912.11746.pdf) have shown that inverse depth+soft-argmin could also produce good or even better result.
I find the pair.txt and cams.txt for all the scans are the same. However, they should be different.
If I want to use my own dataset, how to change to it to fit the mvsnet for training? Thanks!