Closed JoyHuYY1412 closed 1 year ago
I think the memory is not just the number of trainable parameters as you used cutmix, which actually doubled the batch_size when feeding them into the network.
Hi, we didn't count these numbers. But I agree with what you said, memory should be the situation of GPU memory usage during training, not the number of parameters.
Hi Xiaokang,
Thank you for your great work! Would you mind providing me with the memory and inf time for CPS? It would help a lot for citing your work.
I appreciate your help.