Closed MickShen7558 closed 1 year ago
Hi,
That seems strange. Did you change the numer of points per batch element? I.e. the fields npoints_decoder
and
npoints_decoder_non
?
Hi @SimonGiebenhain,
I didn't change anything. I found the GPU memory usage is around 25 GB.
Should I train the identity with the whole chunk00? There is 25 identity inside that chunk. I manually reduced the identity in the chunk and the program can fit in a 4090 GPU now. However, I wonder if it is proper to do so.
Hi,
I see. You said that you tried to reduce the batch size to 8, but "it didn't work"? Did it still not fit into memory, bc. that should definitely work. Or did the changing of batch size not work? In the latter case, I have to admit that the behaviour of the repository is a bit strange, bc. if you there is already a folder with the same experiment name, it will load the configs from the previous run with the same name. So you would have to wither delete the folder or change the run name.
Regarding the chunk00 I am not sure what you mean. Ideally, you would train on the whole dataset, where the identities for each batch are randomly sampled.
I would suggest that you make the batch size slightly smaller such that it fits your GPU, or reduce the number of points slightly
Hi @SimonGiebenhain,
I found my case to be the latter one. I adjusted the batch size to 16 and it works with my 4090 (which takes up 19 GB of GPU memory).
Also, thank you for the clarification of the training process.
Hi,
Thank you for your code and dataset. In your supplementary, you mentioned that you did the training on a 3090 GPU. However, when I did the training with my 4090, the memory usage exceeded 24 GB. I wonder how you put the model into a 3090?
Besides, any suggestions on reducing GPU memory usage? I tried to reduce batch size in the nphm.yaml file from 32 to 8, but it didn't work.