Open snakers4 opened 4 years ago
the paper is updated, the table data is changed~
Do you mean these?
Since you do not report speed-ups on Imagenet, does it mean that it actually takes much longer to train such a sparse network on Imagenet?
In my understanding, this training speed up is based on projection with network sparsity, not measured by the real run with exsiting framework and hardware. To acheive those speed up, you need a special hardware or software to take advantage of sparsity. It should be not big difference b/w training dense and sparse network in this repo.
The convergence rate is approximately the same for sparse and dense networks. What I saw is that the networks react a bit differently to certain learning rates. You can run sparse networks with slightly higher learning rates and this is something not explored in the paper. I kept the learning rates the same to do not give the sparse network an unfair advantage.
The speedups on ImageNet should be a bit larger. In general, for larger datasets and networks I see an increase in speedups. What @yuanyuanli85 says is correct. To utilize these setups you need specialized software and probably also specialized hardware (like a Graphcore or Cerebras processors).
Hi Tim!
Many thanks for this awesome repo and your paper. It is always cool when someone tries DL actually useful, accessible and more efficient!
We are building an open dataset and a set of STT / TTS models for Russian language. You can see some of our published work here.
A quick recap of our findings in this field to provide some context why I am asking my question (bear with me for a moment):
Obviously, your paper is very different in technical approach, but very similar in spirit to what we have done.
You also report these results (obviously, we are more interested in ImageNet results):
Now, a couple of questions (maybe I missed it in the paper):
Many thanks for your feedback!