The batch_size and total running time problem

weiXiaxvv commented 3 years ago

Hi @Sekunde Thanks for your work and post the code on github, what a wonderful work! I have a question about the batch size you set when you ran the project. Based on the paper you only used about 24h for 10 epochs, what the batch size you set for the dataloder? And the mask batch size is 16, however I can not find the parameter setting in the config.py file. I ran the project incredibly slow when batch size in 1(GPU is 2080 Ti) about 20h for 1 epoch, and If a increase the batch size it will report error.

Thank you!

Sekunde commented 3 years ago

https://github.com/Sekunde/3D-SIS/blob/master/experiments/cfgs/ScanNet/example.yml#L88 You can first turn this off, to only train on detection and then train the mask backbone afterwards, this should be faster; Not sure if there are some versions that are changed causing it training slow; normally we train the detection pipeline for a day, and mask pipeline for another day. The batch size we are using is 1. (I assume the error you got is because fo OOM) Also if you want to train even faster you can turn the color part off: by setting this to False https://github.com/Sekunde/3D-SIS/blob/master/experiments/cfgs/ScanNet/example.yml#L88

weiXiaxvv commented 3 years ago

Hi @Sekunde

Thanks for your reply, that's really helpful. My GPU memory is 10GB so that the max batch size I could set is 12 or it will OOM. I've turned off the mask backbone and that could not be helpful. To be honest I've investigated your paper and code for couple of weeks, I've tried a lot of ways to speed up the project(dataParallel, try differnet num_works), However the only way to solve it is increase the dataloader batch size. And I amend some code to fit the increased batch size or it will report errors. Finally, I got the detection back only, However there is no way for me to identify if the my detection back bone is good enough as yours(because I amend some code so that I can not guarantee it work well) which is mentioned in your paper. There is no tool offered in your code could be used to test the mAP of detection part only, so I want to go back to try it in batch size 1. If the CUDA version may affect that(Mine is 10.1)? Or there is some way I could identify my mAP of detection backbone? In the end, I am confused about turn the color part off, is that the same line you quote before? https://github.com/Sekunde/3D-SIS/blob/master/experiments/cfgs/ScanNet/example.yml#L88

Thanks a lot again!

Sekunde commented 3 years ago

sry, the line was wrong, it should be this flag: https://github.com/Sekunde/3D-SIS/blob/master/experiments/cfgs/ScanNet/example.yml#L93 it will also save some memory if you turn off the color; maybe you can fit more batches. We use a very old CUDA Version and Pytorch. Maybe you can try to use the old one to see if that speeds up.

Another idea to speed up is to use the sparse conv backbone, we have a project that replace the backbone of 3D-SIS with a sparse conv backbone, i.e. http://kaldir.vc.in.tum.de/scannet_benchmark/result_details?id=369

And we do have the mAP eval for only detection https://github.com/Sekunde/3D-SIS/blob/master/lib/model/trainval.py#L553-L558

@weiXiaxvv maybe another thing for debugging if it is because the CUDA and PyTorhc Version, you can profile which part took most of the time, it forward pass or backward pass or dataloading?

weiXiaxvv commented 3 years ago

Hi @Sekunde

Turning off the color is really helpful approach to speed up!!! May I ask the mAP eval for the only detection is mAP@0.25 or mAP@0.5? https://github.com/Sekunde/3D-SIS/blob/master/lib/model/trainval.py#L553-L558 I can found out performance about the detection in the paper (geo+5views) 40.2 for mAP@0.25 and 22.5 for mAP@0.5. And the value I got is 0.1945 approximately， if the value i got is percentage so that I need to time 100? and which one should I compare with? mAP@0.25 or mAP@0.5

Sincerely thanks for your reply again.

Sekunde commented 3 years ago

https://github.com/Sekunde/3D-SIS/blob/master/experiments/cfgs/ScanNet/example.yml#L47 it is set up there on your config file if it is 0.5 or 0.25; yeah, you should times 100.

weiXiaxvv commented 3 years ago

Hi @Sekunde

I did profile the time consuming about each part. In one iteration from the dataloder_train consume 0.20s approximately(batch size = 1), about 40%-45% time on forward(in the forward part the proposal_layer function cost half time), and time 40-45% on projection, the rest part cost a little. If I run the project in local PC(cuda version is 8.0， GPU 1660S), the projection cost much more less than in server, about 1/10 of in server， forward cost a little more than in server, maybe 40%. There is any clue to you? Sry I am not familiar to these.

And one more question about the validation part. I can see there are two different validation process. what's the difference between val and train_val? I can see they use different validation data-set but what is the difference?

And about turning off the color. If it will affect the final model performance?

And based on the paper the training data size is 108241 chunks, However, there are 161561 chunks in the train.txt.

Thanks for pointing out to profile time cost. And your patient replying.

Sekunde commented 3 years ago

it seems the CUDA Version influences the projection time, and this part actually costs time. But if you turn off the color pipeline, it does not use the projection part anyway, I assume, there are some pytorch + cuda adaption problem of some function used in projection code, not sure about that as well. If you turn off the color, it will have worse performance.

val means using 312 validation scenes, train_val is for scannet benchmark (there are 100 hidden test scenes), where we split 100 scenes from 312 val scenes for val and the rest 212 scenes are also used for training.

chaolongy commented 3 years ago

I also encountered the same problem, and finally found that changing the num_workers in trainval.py to 8 makes it faster, and at the same time set batch_size=8, otherwise, the time is still very slow.

weiXiaxvv commented 3 years ago

I also encountered the same problem, and finally found that changing the num_workers in trainval.py to 8 makes it faster, and at the same time set batch_size=8, otherwise, the time is still very slow.

Exactly, the batch_size is the main element affect the runinng time. Whatever the num_workers is it will not affect the speed in my machines(I've tried 1080Ti and 2080Ti, no better). However if you want to amend the sizse of batch_size you have to update some code, because the whole project is running batch_size in 1. and with the increasing batch_size the finally result will lose accuracy compare with batch_size in 1. Even If the batch_size is large enough(12 of 16) it could be finished in one hour that I've tried. I still not solve the problem yet, it could not accomplished in 3 hours in batch_size 1 to me.

Sekunde / 3D-SIS

The batch_size and total running time problem #37