NOTtheMessiah commented 8 years ago

I ran the Julia version of the Neural-Style example with a check on GPU memory usage, and it seems to grow linearly from time to time, behavior not seen in the Python version despite being nearly identitical syntactically. Perhaps this is a difference between Julia and Python semantics? Or maybe adding type signatures to some arrays/composite types could help.

Note: to run the Julia fork, you need to add the Factor subtype of AbstractLearningRateScheduler to optimizer.jl in the package directory.

Attached is a plot of Epoch vs GPU Memory usage in megabytes. memusage

NOTtheMessiah commented 8 years ago

Actually, replaced a::NDArray /= b with mx.div_from!(a,b) and a::NDArray *= b with mx.mul_to!(a,b) and I have much better performance, though some operations still seem to be using more memory than the Python equivalent. memusage2 Didn't realize that operations such as /= couldn't be overloaded.

vchuravy commented 8 years ago

See the discussion in https://github.com/JuliaLang/julia/issues/249 and https://github.com/JuliaLang/julia/pull/13666

We should probably mention that and maybe we can use a macro to turn them into inplace operations @nd_as_jl might be a good place of doing that.

NOTtheMessiah commented 8 years ago

Thanks, I was able to squash the other major leak, although it might be worth noting that the macro didn't like being fed the contents of a composite type, but that was easy enough to work around.

vchuravy commented 8 years ago

It might be helpful for others to have a FAQ with things like these. Would you mind writing up what you did?

On Tue, 12 Jan 2016, 09:10 Brian C notifications@github.com wrote:

Closed #56 https://github.com/dmlc/MXNet.jl/issues/56.

— Reply to this email directly or view it on GitHub https://github.com/dmlc/MXNet.jl/issues/56#event-511807868.

pluskid commented 8 years ago

Yes, there is a potential memory "leak", as the underlying memory is managed by the GC. @NOTtheMessiah Could you test whether calling gc() every several iterations would help? -- it will slow down the computation, though.

The difference between Python and Julia is that Python GC is using ref-counting, which is able to re-claim the memory right after it is not being referenced somewhere. But Julia needs to wait a global GC call. I'm not familiar with the underlying GC implementation, but it might be automatically triggered when free memory is low. But the memory allocated by libmxnet (and especially GPU memory) is not visible to Julia GC, so it has no idea of when should GC be called.

Unfortunately, as far as I know there is currently no immediately better way to do native resource management in Julia.

NOTtheMessiah commented 8 years ago

@pluskid what I did was write a convenience function that calls nvidia-smi from the shell and parses the response to get a broad overview to make the plots.

@vchuravy I wrote some notes, feel free to remix

Porting MXNet's Neural-Style Python code to MXNet.jl

These notes intend to document what I've had to do to get a working implementation of A Neural Algorithm of Artistic Style from within Julia. These may or may not be applicable for other examples ported from Python to Julia, but are here for reference.

MXNet.jl

Official documentation might be a good place to start.

argparse to ArgParse.jl

ArgParse.jl's documention

This library is used to turn it into a command-line program.

Named tuples to Composite types

Julia documentation on (composite) types

This translation is pretty easy, and Julia's built-in composite types are very straightforward to work with, compared to Python's named tuples which require from collections import namedtuple. For the ConvExecutor, it means going from

Executor = namedtuple('Executor', ['executor', 'data', 'data_grad'])

to

type SGExecutor
    executor :: mx.Executor
    data :: mx.NDArray
    data_grad :: mx.NDArray
end

I also renamed it just to avoid ambiguity. The type signatures are optional, but I included them for type safety.

Row-major order to Column-major order arrays

If interested, read the wikipedia article here, but summarized, Julia tends to think of a matrix as a array of column-vectors, while Python natively stores it as a list of row lists. The major difference when it comes to dealing with multi-dimensional arrays is that the ordering of shape is inverted.

Example:

out.infer_shape(data=(1, 3, input_size[0], input_size[1]))

would become the following in Julia

mx.infer_shape(out, data=(input_size[1], input_size[2], 3, 1))

Memory leaks

As of Julia v0.4, some methods cannot be directly overwritten such as assignment operators (_ ⊗= _ for some binary operation ⊗ in [-,+,/,*]), so for lines that take the form a[:] ⊗= ..., you can take one of several approaches to avoid unnecessary use of graphics memory, all of which are described in the NDArray source code.

b=copy(a::NDArray) to a julia array then perform your operations and then copy!(a,b) back.
some combination of mx.mult_to!, mx.div_from!, etc. for directly modifying the NDArray
Use the macro mx.@nd_as_jl to work as if you were just using native arrays

This takes arguments ro for NDArrays that are read and rw for those that are written to. This macro copies everything to native Julia arrays and then writes them back into the NDArrays.

Implement Factor-based Learning Rate

This required modifying Mocha.jl's src/optimizer.jl as to include an unimplemented subtype of AbstractLearningRateScheduler so it can cooperate with the Stochastic Gradient Descent. There may be a better way to do this. Also not entirely comfortable with the robustness of my implementation as of the time of writing.

dmlc / MXNet.jl

Apparent Memory leak in Neural-Style example #56

Porting MXNet's Neural-Style Python code to MXNet.jl

MXNet.jl

argparse to ArgParse.jl

Named tuples to Composite types

Row-major order to Column-major order arrays

Memory leaks

Implement Factor-based Learning Rate