Open vchuravy opened 8 years ago
Yes the GC problem is generally painful here. I am wondering if there is (or is going to be) any good ways in Julia to manage external resources.
I like the way files are handled with do...end
blocks, but I don't quite now how that would pan out in terms of memory managements and normally one shouldn't create and destroy executors in a loop.
Ideally we would either reuse the current executor or free the previous one before creating a new one here: https://github.com/dmlc/MXNet.jl/blob/a2164ae43ab70d8be7708b7dc9974a5a6a360a8e/src/model.jl#L131
do ... end
is very limited and not easy to use in many cases.
For this particular case, I think it goes to the else
branch and re-use executor unless overwrite
is true
: https://github.com/dmlc/MXNet.jl/blob/a2164ae43ab70d8be7708b7dc9974a5a6a360a8e/src/model.jl#L135
Which it is: https://github.com/dmlc/MXNet.jl/blob/a2164ae43ab70d8be7708b7dc9974a5a6a360a8e/src/model.jl#L186
I was wondering if it would be possible to use the same executor and update the weights of the model?
Oh, I see. Yes, it would be a good idea to add some function like sync_params
to copy the parameters over. A yet better way is the module system recently introduced in the python side. Essentially the same executors are used for both training and prediction. Data-parallelism could be supported in prediction in that case. But that definitely requires a fair amount of refactoring and porting.
Testing the different checkpoints of a training run requires loading checkpoints and predictions runs in a tight loop.
Without these final two lines this easily runs out of memory for big models/batch_sizes, because the executor is not gc'd yet and we are creating a new one asking for more memory.
If we could reuse the previous executor that problem would be alleviated.