Open aseyboldt opened 7 years ago
@tlby something that you noticed as well
Thinking a bit more about this, I am a bit confused about why there is any synchronisation at all. I'm really new to mxnet, so I might be missing something, but shouldn't the engine be able to tell if there are any outstanding operations at all? And if not, couldn't it just skip the ThreadedVar::WaitForVar
call entirely? If there is nothing that might want to change any variable, then that variable in particular should be fine, right? My guess would be that this is the case most of the time when executing things synchronously.
While investigating a performance issue I noticed that setting the values of a
mx.nd.NDArray
is somewhat slow os osx (sierra):For comparison, pure numpy takes about 400ns. Some of this seems to be python overhead (the largest ones I found were
a.shape
with about 2μs anda.ctypes.data_as(ctypes.c_void_p)
with 4μs ina._sync_copyfrom
. Most of it is on the C side however:On a linux machine that same test runs in 900ns.
I am using version
0.11.1
according tomx.__version__
, installed viapip install --pre mxnet-mkl
.I sampled the stack trace while
MXNDArraySyncCopyFromCPU
was running in a loop: