Closed mrqrs closed 2 years ago
@mrqrs We cannot release the checkpoint, but I reproduced the results before I release this repo. What result did you get, and did you change some of the configs?
I have used 4 3090 gpus with batch size 32 and the centerpoints.yaml to train the model and did not change the config file. The result of three class is loser than paper. The result is:
@mrqrs We were using 8 gpus with batch size 32 for training. 4 3090 gpus should be batch size 16 without changing the yaml file? If the batch size is different, maybe you can try different training lr and epochs?
@mrqrs We were using 8 gpus with batch size 32 for training. 4 3090 gpus should be batch size 16 without changing the yaml file? If the batch size is different, maybe you can try different training lr and epochs?
You trained the model with 8 gpus with batch size 32. (batch size 4 per gpu? or batch size 32 per gpu?) The args "batch_size" in openpcdet is for all gpus (for one gpu is : batch_size / gpu_number) while training the model with multi-gpus.
@mrqrs We were using 8 gpus with batch size 32 for training. 4 3090 gpus should be batch size 16 without changing the yaml file? If the batch size is different, maybe you can try different training lr and epochs?
And i used the pytorch 1.8.0, cuda 11.1, pcdet 0.3.0 and spconv 1.2.1. Will the environment affect the result?
@mrqrs We specify bs_per_gpu=4 in the yaml file. If we both use batch size 32, I think the problem may not happen here. Did you incorporate our code into your pcdet repo? Can you just re-compile this repo and train the models with this repo? I'm not sure whether there are any differences with pcdet 0.3.0. I think other environments (pytorch, cuda, spconv version) are not sensitive.
Yes, I cloned and compiled this repo. I did not use the original pcdet repo.
@mrqrs That's too weird. Did you try training it multiple times or evaluating checkpoints at different epochs? If you still cannot reproduce the results, I may try getting a checkpoint to you, but it's complicated and takes time.
If you use 4 gpus, please change bs_per_gpu =8, use this command and do not add any other args. Try running it multiple times.
Change this to 8 in centerpoints.yaml.
@mrqrs That's too weird. Did you try training it multiple times or evaluating checkpoints at different epochs? If you still cannot reproduce the results, I may try getting a checkpoint to you, but it's complicated and takes time.
Ok, thanks for your reply. I juste tested withe epoch 80. I will test with different checkpoint and train model with your suggested method.
@mrqrs Hi, can you reproduce the results now?
@mrqrs Hi, can you reproduce the results n
@mrqrs Hi, can you reproduce the results now?
No, the result is poor than paper. The result is:
@mrqrs Hi, can you reproduce the results now?
@mrqrs Hi, can you reproduce the results now?
Hi, how do i submit the test result? Is this submit web still avaliable? I submitted the code outputed result.pkl to the web. And i can not see the result. It seems to be in the process of scoring with a long time.
@mrqrs Hi, can you reproduce the results n
@mrqrs Hi, can you reproduce the results now?
No, the result is poor than paper. The result is:
@mrqrs I can share a checkpoint with reproduced results with you. The download link is https://drive.google.com/file/d/1lj8smqo3fc9qVz7eanjuNXyhskPy3jez/view?usp=sharing.
Plz carefully check where the problem is yourself.
@mrqrs Hi, can you reproduce the results now?
Hi, how do i submit the test result? Is this submit web still avaliable? I submitted the code outputed result.pkl to the web. And i can not see the result. It seems to be in the process of scoring with a long time.
Plz strictly follow the submission format mentioned on the website:
@mrqrs Hi, can you reproduce the results n
@mrqrs Hi, can you reproduce the results now?
No, the result is poor than paper. The result is:
@mrqrs I can share a checkpoint with reproduced results with you. The download link is https://drive.google.com/file/d/1lj8smqo3fc9qVz7eanjuNXyhskPy3jez/view?usp=sharing.
Plz carefully check where the problem is yourself.
Thanks for your reply. I will test it again. Thanks.
Closed if no other problems.
@mrqrs We cannot release the checkpoint, but I reproduced the results before I release this repo. What result did you get, and did you change some of the configs?