drprojects / DeepViewAgg

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Other
222 stars 24 forks source link

training time #23

Closed TengfeiZeng closed 1 year ago

TengfeiZeng commented 1 year ago

Hello author, dataset S3DIS is how long have you run with your device?

drprojects commented 1 year ago

Hello @TengfeiZeng are you asking for the S3DIS preprocessing time, or training time ?

TengfeiZeng commented 1 year ago

Hello @TengfeiZeng are you asking for the S3DIS preprocessing time, or training time ?

Hello, author. I want to know both of them.

drprojects commented 1 year ago

The preprocessing of all S3DIS takes about 20 min on my machine. For S3DIS, a large part of the preprocessing time is spent reading the raw files, which are saved under a not-so-convenient format. Second is the KNN search. Our mappings computation is quite fast, though. If you check out our paper: https://arxiv.org/pdf/2204.07548.pdf

Our GPU-accelerated implementation can process the entire S3DIS dataset subsampled at 5cm (12 million points and 1413 high-resolution equirectangular images) within 65 seconds.

The training for a single fold (eg fold 5 by default) with 3D resolution of 2cm and a 2D resolution of 1024x512 takes 53 hours on my machine, using a single V100 GPU. For the whole 6-fold cross-validation, you will need to do this for each fold. Reducing the resolution would reduce training time but also reduces performance.

TengfeiZeng commented 1 year ago

The preprocessing of all S3DIS takes about 20 min on my machine. For S3DIS, a large part of the preprocessing time is spent reading the raw files, which are saved under a not-so-convenient format. Second is the KNN search. Our mappings computation is quite fast, though. If you check out our paper: https://arxiv.org/pdf/2204.07548.pdf

Our GPU-accelerated implementation can process the entire S3DIS dataset subsampled at 5cm (12 million points and 1413 high-resolution equirectangular images) within 65 seconds.

The training for a single fold (eg fold 5 by default) with 3D resolution of 2cm and a 2D resolution of 1024x512 takes 53 hours on my machine, using a single V100 GPU. For the whole 6-fold cross-validation, you will need to do this for each fold. Reducing the resolution would reduce training time but also reduces performance.

thank you