Is there any plan for supporting model training parallelism?

parkerzf commented 7 years ago

Thanks for open sourcing this project. It is said that it does not perform single-model distributed training - this is an area of active research, and here we aim to provide the most practical solutions for the majority of deep learning use cases. I wonder if you have plan to support model training parallelism in near future. Like the dist-keras project: https://github.com/cerndb/dist-keras?

Thanks!

thunterdb commented 7 years ago

@parkerzf it is certainly an area that is worth exploring. Achieving good performance in a distributed setting requires some fair knowledge of the optimization algorithms, and it is already served by great projects such as https://github.com/cerndb/dist-keras (as you mentioned) or https://github.com/yahoo/TensorFlowOnSpark . We have been focusing for now on simple APIs that work well with the rest of the Spark ecosystem, and we would like to eventually see the benefits of distributed deep learning training brought to the standard Spark interfaces. If you have some suggestions and some problems that benefit from a distributed setting, feel free to mention them here.

mydpy commented 7 years ago

@thunterdb Should we close this for now?

databricks / spark-deep-learning

Is there any plan for supporting model training parallelism? #23