apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Overhead of MXNDArraySyncCopyFromCPU on osx #8112

Open aseyboldt opened 7 years ago

aseyboldt commented 7 years ago

While investigating a performance issue I noticed that setting the values of a mx.nd.NDArray is somewhat slow os osx (sierra):

import mxnet as mx
import numpy as np
import ctypes

a = mx.nd.zeros(4)
b = np.zeros(4, dtype='f')
%timeit a[:] = b
28.3 µs ± 653 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

For comparison, pure numpy takes about 400ns. Some of this seems to be python overhead (the largest ones I found were a.shape with about 2μs and a.ctypes.data_as(ctypes.c_void_p) with 4μs in a._sync_copyfrom. Most of it is on the C side however:

handle = a.handle
b_addr = b.ctypes.data_as(ctypes.c_void_p)
b_size = ctypes.c_size_t(b.size)
%timeit mx.base._LIB.MXNDArraySyncCopyFromCPU(handle, b_addr, b_size)
14.3 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

On a linux machine that same test runs in 900ns.

I am using version 0.11.1 according to mx.__version__, installed via pip install --pre mxnet-mkl.

I sampled the stack trace while MXNDArraySyncCopyFromCPU was running in a loop: image

sergeykolychev commented 7 years ago

@tlby something that you noticed as well

aseyboldt commented 7 years ago

Thinking a bit more about this, I am a bit confused about why there is any synchronisation at all. I'm really new to mxnet, so I might be missing something, but shouldn't the engine be able to tell if there are any outstanding operations at all? And if not, couldn't it just skip the ThreadedVar::WaitForVar call entirely? If there is nothing that might want to change any variable, then that variable in particular should be fine, right? My guess would be that this is the case most of the time when executing things synchronously.