apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.74k stars 6.81k forks source link

Gluon speed issus when input size varies across batches #9914

Open chengdazhi opened 6 years ago

chengdazhi commented 6 years ago

Hi. I have noticed that the gluon framework has speed issues when the input spatial size varies across batches. It causes an approximate 2x delay on a single GPU, and makes multiple GPUs to have little gain. The framework discards previous GPU memory blocks when start processing a new batch, which leads to violent GPU memory fluctuations.

This problem is absent in previous non-gluon training systems.

marcoabreu commented 6 years ago

Would you mind providing an MVE to allow reproducability?

chengdazhi commented 6 years ago

@marcoabreu Thanks for your prompt reply. Problem is fixed by padding all images to a fixed size. Providing a MVE would be highly demanding for me. Hope this bug could be fixed in future.

piyushghai commented 5 years ago

@mxnet-label-bot [Gluon]