aws-samples / amazon-sagemaker-tensorflow-object-detection-api

Train and deploy models using TensorFlow 2 with the Object Detection API on Amazon SageMaker
MIT No Attribution
42 stars 34 forks source link

Increasing batch size #12

Closed nfbalbontin closed 3 years ago

nfbalbontin commented 3 years ago

Hi! I'm trying to increase the batch size for the training of the model but each time I execute it, in the training face, it gives me the following error: (0) Resource exhausted: OOM when allocating tensor with shape[16,112,40,40] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc Which normally points to a lack of memory RAM. Because of this, I tried different approaches for solving the problem:

However, in both instances I had the same error. Is there a way for increasing the batch size without having an ResourceExhaustedError?

sofianhamiti commented 3 years ago

@nfbalbontin I this is lack of GPU memory. More like how TF ODI manages it across multiple GPUs I would check issues on the TF ODI github repo for help in this.