Closed Ravenwater closed 5 years ago
This is a great discussion to start, especially with the folks working on our Relay front end @jroesch.
On the hardware end, I think there are a lot of opportunities to extend TVM's existing back-end to support these novel data types.
@tqchen @tmoreau89 @jroesch excellent.
I want to make certain that the 'larger' impact of the deferred rounding concept to be able to get away with much smaller representations and thus improve communication/memory bandwidth and power efficiency, requires a higher-order instruction called a fused dot-product. Managing this fused dot product as an atomic operation is important, but it impacts all the BLAS operators that depend on it (matvec, matmul, etc.). The fun starts when you need to manage resource contention and need to start blocking the fused dot-products. That is why it is so important to start the collaboration between the IR, its code generator, and the instruction set of the hw. If you look at TensorFlow and PyTorch it is clear that they didn't have this collaboration and both environments are not easily modified to introduce these new number systems.
How do I get plugged in and contribute to this in TVM/VTA? I need a bootstrap what has been discussed and planned.
Fresh from the source:
Jeff is on GitHub at https://github.com/wickedfoo (@wickedfoo what's up), he's probably interested in this. All the code for the new paper is at https://github.com/facebookresearch/deepfloat/ btw if you haven't seen it.
@Ravenwater it would be great to start a coordinated effort on this. as you said, the graph IR, tensor IR, code generation, hardware ISA and hardware organization all have to be designed in concert to make this work.
I think it makes sense to push for posit support in the next TVM release cycle. I am happy to work on this coordinated effort to get this supported on the hardware end. Right now VTA is specified in Vivado HLS, which is not really the right abstraction level to replace the implementation of customized datatypes (unless Vivado provides support for posits, in which case we're in business). There is an effort at UW which will port the VTA design to the Chisel DSL, which will make it very flexible for us to support unconventional floats. Posit support will be a top priority.
For the TVM, and graph IR components, it would be good to gather @tqchen 's and @jroesch 's thoughts. I presume that as long as the posit representations are powers of two wide, it should be fairly similar to supporting fixed-point representation (at least in terms of data layout). There is an on-going effort to make 8-bit inference supported in Relay, and from there we'll work on sub-8bit graph support, such as 4bit, 2bit and 1bit integers.
move to #3060
Wanted to record two papers that are reporting the significant benefits provided by alternative number systems for DNN training accuracy and computational efficiency:
https://arxiv.org/pdf/1603.01025.pdf reports on the benefits of logarithmic number systems
https://arxiv.org/pdf/1811.01721.pdf expands the research to include posit and non-IEEE floats
One of the key benefits of a stack like TVM that can talk to custom FPGA execution engines is the ability to take advantage of these more computational and power efficient number systems. However, these number systems need to be supported as intrinsic types to be easily used in actual DNN model design. This is easily accomplished in the C++ code with ready-to-go arithmetic libraries such as Universal, but the python dsl is not as flexible.
I would like to start the discussion how to best bring the python DSL along to make it possible for TVM to target the plethora of new number systems specifically designed for DNN training and inference.