apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[RFC] Custom Operator Part 2 #17006

Open samskalicky opened 4 years ago

samskalicky commented 4 years ago

Description

Request for comments on the next PR for enhancing custom operator support

References

wkcn commented 4 years ago

Hi @samskalicky , thank you for the contribution! I have several suggestions.

Thanks.

rondogency commented 4 years ago

Need to include a fix for the test error https://github.com/apache/incubator-mxnet/pull/15921#pullrequestreview-328686634

larroy commented 4 years ago

@wkcn could you explain your suggestion? calling gemm back into the framework which gets dispatched to GPU or CPU?

samskalicky commented 4 years ago

We should create a namespace for the stuff in the lib_api.h file as suggested by @larroy: https://github.com/apache/incubator-mxnet/pull/15760/files#r311756416

wkcn commented 4 years ago

@larroy Users may need matrix operators and DNN Op(e.g. ReLU, Conv) when writing a custom Op. Although they can implement it by third-party libraries, it is more convenient to use the built-in functions in MXNet.

ptrendx commented 4 years ago

Custom ops should be able to set the inplace property.

kpuatamazon commented 4 years ago

Speed. All those std::string and std::unordered_map objects don't come cheaply.

I compared an integrated fork with a custom operator.

https://github.com/kpuatamazon/incubator-mxnet/tree/intgemm integrated version end-to-end Sockeye performance (based on 1.6.0):

real    2m57.962s
user    7m3.986s
sys 0m6.724s

Custom operator version (based on 1.7.x. because it had to be for custom operators):

real    3m16.879s
user    7m43.727s
sys 0m8.273s

Conditions: unset MXNET_ENGINE_TYPE; export OMP_NUM_THREADS=2; numactl -C 0-7 translate.sh Both were compiled with the MKL backend hack for the remaining fp32 operations.