Closed diyessi closed 6 years ago
Currently, lower level primitives (kernels) are highly-optimized tensor operations. We would need to generate code that was competitive with these optimized kernels.
Yup, there's the rub; maximum parsimony here would require transformers to do fat pattern-matches like BinaryReduce<+,0>(ElementWiseBinary<*>(arg1,arg2))
instead of Dot1D(arg1,arg2)
(or whatever). This may prove a bit brittle in practice.
Will think more about the case for it, but to reiterate I don't think this is a blocker for merge of #87.
To make matters worse, we're in a semi-JIT mode. We will need to see what the compile/compute ratio actually is. If we only end up compiling once, we can spend more time at it than if we need to compile every batch.
Would it make sense to unbox element-wise ops at some point? Dot as reductions? Convolution as dots.
Currently, lower level primitives (kernels) are highly-optimized tensor operations. We would need to generate code that was competitive with these optimized kernels.