apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.62k stars 3.45k forks source link

[RFC] Robust support for non-standard logarithmic and posit number systems #2080

Closed Ravenwater closed 5 years ago

Ravenwater commented 5 years ago

Wanted to record two papers that are reporting the significant benefits provided by alternative number systems for DNN training accuracy and computational efficiency:

https://arxiv.org/pdf/1603.01025.pdf reports on the benefits of logarithmic number systems

https://arxiv.org/pdf/1811.01721.pdf expands the research to include posit and non-IEEE floats

One of the key benefits of a stack like TVM that can talk to custom FPGA execution engines is the ability to take advantage of these more computational and power efficient number systems. However, these number systems need to be supported as intrinsic types to be easily used in actual DNN model design. This is easily accomplished in the C++ code with ready-to-go arithmetic libraries such as Universal, but the python dsl is not as flexible.

I would like to start the discussion how to best bring the python DSL along to make it possible for TVM to target the plethora of new number systems specifically designed for DNN training and inference.

tmoreau89 commented 5 years ago

This is a great discussion to start, especially with the folks working on our Relay front end @jroesch.

On the hardware end, I think there are a lot of opportunities to extend TVM's existing back-end to support these novel data types.

Ravenwater commented 5 years ago

@tqchen @tmoreau89 @jroesch excellent.

I want to make certain that the 'larger' impact of the deferred rounding concept to be able to get away with much smaller representations and thus improve communication/memory bandwidth and power efficiency, requires a higher-order instruction called a fused dot-product. Managing this fused dot product as an atomic operation is important, but it impacts all the BLAS operators that depend on it (matvec, matmul, etc.). The fun starts when you need to manage resource contention and need to start blocking the fused dot-products. That is why it is so important to start the collaboration between the IR, its code generator, and the instruction set of the hw. If you look at TensorFlow and PyTorch it is clear that they didn't have this collaboration and both environments are not easily modified to introduce these new number systems.

How do I get plugged in and contribute to this in TVM/VTA? I need a bootstrap what has been discussed and planned.

Ravenwater commented 5 years ago

Fresh from the source:

https://code.fb.com/ai-research/floating-point-math/

ajtulloch commented 5 years ago

Jeff is on GitHub at https://github.com/wickedfoo (@wickedfoo what's up), he's probably interested in this. All the code for the new paper is at https://github.com/facebookresearch/deepfloat/ btw if you haven't seen it.

tmoreau89 commented 5 years ago

@Ravenwater it would be great to start a coordinated effort on this. as you said, the graph IR, tensor IR, code generation, hardware ISA and hardware organization all have to be designed in concert to make this work.

I think it makes sense to push for posit support in the next TVM release cycle. I am happy to work on this coordinated effort to get this supported on the hardware end. Right now VTA is specified in Vivado HLS, which is not really the right abstraction level to replace the implementation of customized datatypes (unless Vivado provides support for posits, in which case we're in business). There is an effort at UW which will port the VTA design to the Chisel DSL, which will make it very flexible for us to support unconventional floats. Posit support will be a top priority.

For the TVM, and graph IR components, it would be good to gather @tqchen 's and @jroesch 's thoughts. I presume that as long as the posit representations are powers of two wide, it should be fairly similar to supporting fixed-point representation (at least in terms of data layout). There is an on-going effort to make 8-bit inference supported in Relay, and from there we'll work on sub-8bit graph support, such as 4bit, 2bit and 1bit integers.

tqchen commented 5 years ago

move to #3060