aim-uofa / GenPercept

GenPercept: Diffusion Models Trained with Large Data Are Transferable Visual Models
https://huggingface.co/spaces/guangkaixu/GenPercept
Creative Commons Zero v1.0 Universal
91 stars 2 forks source link

Specific data amount on hypersim & v-kitti for depth prediction #6

Closed haodong2000 closed 1 week ago

haodong2000 commented 2 weeks ago

Dear authors, excellent work, it would be a milestone in CV community!

I got an simple question: what is the specific data amount of hypersim & v-kitti used for training depth estimator.

For instance, in hypersim, did you used only the training split (54K), or the entire dataset (74K)?

Thanks so much!

guangkaixu commented 1 week ago

Hi, thanks for being interested in our work!

For hypersim, we use the training split after filtering some invalid scenes and images. (e.g., depth images with nan value, images with all 255 or 0, and some broken scenes in this issue of hypersim.

For virtual kitti, we use all the image-depth pairs. We set the sky depth value to the maximum depth except the sky area. For example, the sky is 655.35m, and the farthest road is 100m away, and we set the sky depth to 100m. The depth_{sky} value changes for each image.

hypersim: around 44K RGB-D pairs. virtual kitti: around 43K RGB-D pairs.