SelimaC / large-scale-sparse-neural-networks

11 stars 1 forks source link

The Hutter Prize Needs Something Like This #3

Open jabowery opened 3 years ago

jabowery commented 3 years ago

Dear Selima and Colleagues,

The attached answer to a question in the FAQ for the Hutter Prize for Lossless Compression of Human Knowledge indicates the contestants may be interested in your approach.

Jim Bowery, Judge Hutter Prize Committee

PS: Current entries depend on RNNs (LSTM) so, an approach like that described in Efficient and effective training of sparse recurrent neural networks may ease the transition of the competition to large scale sparsity.

Why do you restrict to a single CPU core and exclude GPUs?

The primary intention is to limit compute and memory to some generally available amount in a transparent, easy, fair, and measurable way. 100 hours on one i7 core with 10GB RAM seems to get sufficiently close to this ideal. This roughly corresponds 500'000/T hours, where T is the GeekBench5 score of the machine. The latter can be used by contestants to estimate whether their algorithm is likely to run in under 100 hours on our test machine. If your algorithm uses C cores, typical speedup is at most a factor of C, and often much less. So adapting the rules to allow 100/C wall-clock hours on C cores would be unhelpful. If we'd allow 100 hours straight, we'd favor super computers with 1000s of cores. The primary reason for excluding GPUs is that they stifle creative/innovative ideas. Any algorithmic ideas that cannot (easily or at all) be vectorized are disadvantaged compared to vectorizable algorithms. As an example, sparse Neural Networks seem to be superior to dense ones but have a hard time reaping the benefits on GPUs and TPUs. If non-vectorizable algorithms start to consistently beat vectorizable ones CPUs, this will drive hardware innovation to support whatever special structure (e.g. sparcity) these novel algorithms may have. Using CPUs levels the playing field. A secondary reason is that, while it is true that most NLP algorithms based Neural Networks run on GPUs (or even TPUs), despite using much more compute, so far they are not quite competitive to SOTA compressors, which mostly run on a single CPU core. As long as large NN compression is not competitive, the restriction to CPUs is mild. Anyway, contestants are welcome to develop their own algorithms on whatever architecture they like. Converting it back to single core and CPU is usually easy. If the number of cores on PCs creeps up into the 100s and GPU-based NNs become competitive, we will consider relaxing the current resource restrictions.