apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

[Discussion] 1.7.0 Roadmap #16864

Open pengzhao-intel opened 4 years ago

pengzhao-intel commented 4 years ago

copy below content from 1.6 roadmap PR https://github.com/apache/incubator-mxnet/issues/15589 :), thanks @szha

Let's start a discussion here about the roadmap towards 1.7.0. We are looking for:

New features that are useful to your research and development. Improvements and patches to existing features. If you have any item that you'd like to propose to have in the roadmap, please do:

Create (or locate existing) issue/pull request for the item, note the issue/pull request number. Comment in this issue: 1) the above issue number, 2) one sentence of what the item is about and why it's useful to you.

Indicate whether you'd be willing to help out on the item. Share the ETA if you're driving the item and have an guesstimate on when it will be done.

Feel free to include items that weren't included in past roadmap discussions that you still wish to include in this release. cc @apache/mxnet-committers

pengzhao-intel commented 4 years ago

For MKLDNN backend

atiqsayyed commented 4 years ago

For Scala 2.12 release

I needed this feature so i locally build 2.12 version, I want to help with this issue but I need some guidance about the current build package and how to go ahead with it.

I'm more than willing to contribute to this specific story

leezu commented 4 years ago

@atiqsayyed this may be feasible for MXNet 1.6 release if you are willing to work on it. You can comment in #16438 and ping yzhliu, nswamy or pllarroy for guidance (listed as codeowner https://github.com/apache/incubator-mxnet/blob/61c8bafdcfee129e4f7a491438a2402e6762ddd9/CODEOWNERS#L16)

cjolivier01 commented 4 years ago

XLA or MLIR graph support. Basically generate XLA-compiler-consumable network graph protobuf similar to pytorch’s approach. This actually isn’t a huge undertaking and will add a lot of value to mxnet imho, making it usable for other custom hardware as only an incremental approach from other XLA-compatible platforms such as pytorch and tensorflow.

pengzhao-intel commented 4 years ago

XLA or MLIR graph support. Basically generate XLA-compiler-consumable network graph protobuf similar to pytorch’s approach. This actually isn’t a huge undertaking and will add a lot of value to mxnet imho, making it usable for other custom hardware as only an incremental approach from other XLA-compatible platforms such as pytorch and tensorflow.

Really good suggestion! Our team is working on the XLA/TVM supports for MXNet but it's still in the early stage and I am not sure we can catch up 1.7.

I will update our progress in the community :)

cjolivier01 commented 4 years ago

XLA or MLIR graph support. Basically generate XLA-compiler-consumable network graph protobuf similar to pytorch’s approach. This actually isn’t a huge undertaking and will add a lot of value to mxnet imho, making it usable for other custom hardware as only an incremental approach from other XLA-compatible platforms such as pytorch and tensorflow.

Really good suggestion! Our team is working on the XLA/TVM supports for MXNet but it's still in the early stage and I am not sure we can catch up 1.7.

I will update our progress in the community :)

What sort of ETA were you thinking for XLA support? 1.7.1? :D

pengzhao-intel commented 4 years ago

What sort of ETA were you thinking for XLA support? 1.7.1? :D

Most likely, we will start from TVM first and then extend to XLA.

I am not sure about the timeline of 1.7. Maybe some experimental features go to 1.7 and most of the features at 1.8 (or later).

ptrendx commented 4 years ago

XLA is effectively dead at this point so I'm not sure why we would want to invest in that. MLIR is not really ready for prime time. Out of all of the compiler technologies (which I agree are important) TVM seems to be the best and most mature option (additional points for it being Apache project as well and multiple community members already working on integrating it into MXNet).

cjolivier01 commented 4 years ago

XLA is effectively dead at this point so I'm not sure why we would want to invest in that. MLIR is not really ready for prime time. Out of all of the compiler technologies (which I agree are important) TVM seems to be the best and most mature option (additional points for it being Apache project as well and multiple community members already working on integrating it into MXNet).

You're certainly entitled to your opinion on XLA/MLIR and TVM, but the fact of the matter is that XLA is more widely supported than TVM (two leading vendors, TF and Pytorch, for instance). If adoption of mxnet is what you're looking for, then adopting technologies that make it easier for hardware vendors to support the maximum number of platforms for the least amount of investment is the best route, imho. So far the approach that mxnet has been taking (opting for proprietary technologies) has arguably been not entirely successful.

szha commented 4 years ago

@cjolivier01 @pengzhao-intel @ptrendx would you mind opening a feature request issue as suggested by the initial post? The roadmap issue is usually for tracking purpose, and having other discussions inside makes it harder to track the features to add.

cjolivier01 commented 4 years ago

@cjolivier01 @pengzhao-intel @ptrendx would you mind opening a feature request issue as suggested by the initial post? The roadmap issue is usually for tracking purpose, and having other discussions inside makes it harder to track the features to add.

“ Let's start a discussion here about the roadmap towards 1.7.0. We are looking for:

New features that are useful to your research and development.”

This is what’s in the description of this page. I don’t think this has been strayed from.

szha commented 4 years ago

I was referring the the instructions just below the lines you were quoting:

If you have any item that you'd like to propose to have in the roadmap, please do:

Create (or locate existing) issue/pull request for the item, note the issue/pull request number. Comment in this issue:

  1. the above issue number,
  2. one sentence of what the item is about and why it's useful to you.
guoquan commented 4 years ago

Let's have it then. #16916 I would (personally) focus it to requesting support for XLA devices. It would be helpful in the way that it enables access to the ~evil~ TPU.

wkcn commented 4 years ago

Existing Feature:

Propose:

mikeobr commented 4 years ago

A feature that would be very useful for production deployments of MXNet models would be the ability to cache or save Autotune results, there are several feature requests already around this topic, such as https://github.com/apache/incubator-mxnet/issues/16173 and https://github.com/apache/incubator-mxnet/issues/10567.

Our production servers have multiple instances of networks on them. Autotuning currently has the issues of :

  1. Cold start times when new versions start getting calls
  2. Instability issues if autotuning is triggered simultaneously amongst several networks.
apeforest commented 4 years ago

A simple script to build from source using cmake: https://github.com/apache/incubator-mxnet/issues/17180

ChaiBapchya commented 4 years ago

Proposal : Exposing OpPerf utility in the MXNet's Pip.

@TaoLv As discussing in one of the OpPerf PR's (https://github.com/apache/incubator-mxnet/pull/17500), let's make OpPerf available to users by adding it to the MXNet binary. This will enhance the usability of the tool.

Brief Description : OpPerf

OpPerf is tool for benchmarking MXNet operator execution. It returns performance stats about operator (specifically Memory Consumption, Forward Time & Backward Time (if applicable)).

Currently, OpPerf utility can be tested by cloning the mxnet repo + setting the PYTHONPATH to the path to cloned repo and run in 1 of the 3 ways

1. Benchmark All Ops

python incubator-mxnet/benchmark/opperf/opperf.py --output-format md --output-file mxnet_operator_benchmark_results.md

This runs OpPerf on all MXNet operators (whose inputs have been given in OpPerf default_params file).

Sample output : https://gist.github.com/ChaiBapchya/7ec49647bb2ae8549e00d703e99371af

2. Benchmark category-specific ops

from benchmark.opperf.nd_operations.binary_operators import run_mx_binary_broadcast_operators_benchmarks

# Run all Binary Broadcast operations benchmarks with default input values
print(run_mx_binary_broadcast_operators_benchmarks())

3. Benchmark individual ops

import mxnet as mx
from mxnet import nd

from benchmark.opperf.utils.benchmark_utils import run_performance_test

add_res = run_performance_test(nd.add, run_backward=True, dtype='float32', ctx=mx.cpu(),
                               inputs=[{"lhs": (1024, 1024),
                                        "rhs": (1024, 1024)}],
                               warmup=10, runs=25)
print(add_res)

For more details : https://github.com/apache/incubator-mxnet/tree/master/benchmark/opperf

szha commented 4 years ago

We will need to address the license issues from past releases as promised. https://github.com/apache/incubator-mxnet/labels/Licenses

ciyongch commented 4 years ago

@szha, sure, I will follow up those items.

stu1130 commented 4 years ago

I would like #17872 to be on 1.7 as DJL LSTM model depends on the fix.

stu1130 commented 4 years ago

https://github.com/apache/incubator-mxnet/pull/17177 solves the locale issue for not only JVM languages but also Python, see https://github.com/apache/incubator-mxnet/issues/18079. So I want to include this one on 1.7

ciyongch commented 4 years ago

A kindly remainder, we've postponed the code freeze date to April 25th PST to extend the time windows for those pending PRs targeting in v1.7.0, please make sure you've all your need in the v1.7.x branch now. Thanks!

ciyongch commented 4 years ago

Hi @ptrendx @roywei , may I know how to decide (or whom should I check with) if this release should go out on Medium blog or not, which you did for 1.5.0 and 1.6.0? Thanks!

deepakkumar1984 commented 4 years ago

Can this feature https://github.com/apache/incubator-mxnet/issues/17940 be considered?

Regards, Deepak

ciyongch commented 4 years ago

Hi @deepakkumar1984, it's a great feature for MXNet extension, but given the release process (1.7.0 is code freeze now, which means no more new feature will be included in this release) and besides this feature is still under development, I do suggest to make it mature and push it to the next release (like 1.8.0 if there's a plan, or probably 2.0?) What do you think? Thanks, Ciyong

deepakkumar1984 commented 4 years ago

Hello Ciyong, Thanks a lot for your suggestion. I will work on the completeness of the library and then request again in next release.

I was thinking if it could get some visibility to help getting some dev and test contribution, If its possible to mention about this library somewhere in MxNet site, it will be very very helpful. I can come up with the content, the current status of the API Development is as follows:

MxNet Core: 90% Dev completed (Working on examples and documentation) Keras-MxNet: 40% Dev completed Gluon-CV: 20% Dev completed MxNet-SciKit: 10% Scikit learn version of library based on MxNet NDArray which will give CPU and GPU capability. Gluon-NLP: 5% Dev Gluon-TS: 5% Dev

Regards, Deepak

ciyongch commented 4 years ago

@deepakkumar1984 that sounds great! A RFC or updates on dev@mxnet.apache.org could be helpful to get more visibility as well as suggestions from community :)

I was thinking if it could get some visibility to help getting some dev and test contribution, If its possible to mention about this library somewhere in MxNet site, it will be very very helpful.