Open L1aoXingyu opened 6 years ago
Try using hybrid layers (i.e. lenet = g.nn.HybridSequential(prefix='lenet_')
)and hybridizing the network. (i.e. lenet.hybridize()
)
I will try this and give you feedback. But I am still confused why gluon imperative graph is much slower than pytorch?
@szha I chance Sequential to HybridSequential, it is a little faster than Sequential, from 175 s to 168 s, but it's still much slower than pytorch.I also check that I run this model on GPU. If there is no wrong in my code, there must be some problem in Gluon, I think. Could you please tell me the reason? As I know, Gluon is actually faster than PyTorch even if I use Sequential rather than HybridSequential.
I notice that in PyToch DataLoader, there is a parameter named num_worker, but in Gluon DataLoader, there is no such a parameter. This parameter can do multiprocess work. If I set num_works = 0
, then PyTorch need about 100 s. So I think this is one of the reason why Gluon is much slower than PyTorch. But even if PyTorch needs 100 s, it's still faster than gluon, so I think there may be some problems.
Yes, I'm guessing it's the I/O that can be improved. @piiswrong @zhreshold
@SherlockLiao How many gpus do you have? And what model is it? I could try to debug it.
@zhreshold just one gpu. It's a simple model, 2 convolution layer, 2 max pooling, 3 dense layer to do mnist classification. I write a same code using mx.sym, then it's very fast, about 20s. I think there must be problem in gluon.
@piiswrong I've tested the gluon code, it's the data transform problem. Forward/backward/optimizer takes 70s on p2, while pure IO without any network inference takes 200s. If use dummy data,
train_dataset = g.data.ArrayDataset(mx.nd.zeros((50000, 1, 28, 28)), mx.nd.zeros((50000,1)))
test_dataset = g.data.ArrayDataset(mx.nd.zeros((10000, 1, 28, 28)), mx.nd.zeros((10000, 1)))
20 Epochs finished in 80s. I guess @SherlockLiao can get better results on his machine.
The transform was executed on python main thread, that's why it's slow.
@zhreshold Can you give me some suggestions about how to do data transform?
@piiswrong This is issue haven't solve ,can you give me some suggestions about how to make the transform faster? or How to solve this issue.Thank you.
I found the same issue. Gluon is slower than the traditional mxnet api.
For bugs or installation issues, please provide the following information. The more information you provide, the more likely people will be able to help you.
Environment info
Operating System: 16.04.2 LTS
Compiler:
Package used (Python/R/Scala/Julia): python
MXNet version: mxnet-cu80 0.11
Or if installed from source:
MXNet commit hash (
git rev-parse HEAD
):If you are using python package, please provide
Python version and distribution:
If you are using R package, please provide
R
sessionInfo()
:Error Message:
I think Gluon should be faster than pytorch, or at least the same speed. But I write a small network, lenet using gluon and pytorch. The hyperparameters are same. I run 20 epochs, and the total time of pytorch is 69.515576 s, but time of the gluon is 175.097399 s. It seems gluon is much slower than pytorch. I don't know if I write gluon code in a wrong way.
Here is my code of two version.
Pytorch
gluon
Minimum reproducible example
if you are using your own code, please provide a short script that reproduces the error.
Steps to reproduce
or if you are running standard examples, please provide the commands you have run that lead to the error.
1. 2. 3.
What have you tried to solve it?
1. 2. 3.