caffe-jacinto training memory problem

TexasInstruments / jacinto-ai-devkit

This repository has been moved. The new location is in https://github.com/TexasInstruments/edgeai-tensorlab

https://github.com/TexasInstruments/edgeai

Other

86 stars 19 forks source link

caffe-jacinto training memory problem #9

Open KurtKoo opened 3 years ago

KurtKoo commented 3 years ago

Hello! I try to train ssd using the caffe-jacinto with a Geforce 940MX GPU(2002MB available).

At first, I ran the training script(voc0712, 256*256) with batch_size == 16 and i failed. The training log said the gpu cannot allocate enough memory. Then I trained on a much smaller dataset and it failed either.

However, it can train with batch_size == 2 on both datasets. Is there a solution that it can train with batch_size == 16?

Thanks!

mathmanu commented 3 years ago

2GB GPU memory is quite less. But you can try to use fp16 for training - that might help you to double the batch size. Try adding the following to your config and try to train:

fp16 = dict(loss_sclae=512.)

KurtKoo commented 3 years ago

2GB GPU memory is quite less. But you can try to use fp16 for training - that might help you to double the batch size. Try adding the following to your config and try to train:

fp16 = dict(loss_sclae=512.)

Thanks for your advice!

However, I'm still not familiar with the caffe-jacinto project. Could you please tell me which 'config' file you exactly mean?