Do you know how NVDLA implement the non-linear activation functions

JunningWu / Learning-NVDLA-Notes

NVDLA is an Open source DL/ML accelerator, which is very suitable for individuals or college students. This is the NOTES when I learn and try. Hope THIS PAGE may Helps you a bit. Contact Me:junning.wu@ia.ac.cn

221 stars 66 forks source link

Do you know how NVDLA implement the non-linear activation functions #2

Open bandjshen opened 6 years ago

bandjshen commented 6 years ago

From the RTL Source code ,it use 9 bits as address for LUT , and it can 4*16bits data in parallel ; but it divide it's table into LO and LE , what this means ? how to decide to use LO or LE ? why it both get the LE[addr] and LE[addr+1] ?

there are also scale/shift/offset/bias after LUT , what this means ?

JunningWu commented 6 years ago

I will check those codes these days.

JunningWu commented 6 years ago

https://github.com/JunningWu/AIChip/issues/11

JunningWu commented 6 years ago

The Single Data Point Processor (SDP) allows for the application of both linear and non-linear functions onto individual data points. This is commonly used immediately after convolution in CNN systems. The SDP provides native support for linear functions (e.g., simple bias and scaling) and uses lookup tables (LUTs) to implement non-linear functions. This combination supports most common activation functions as well as other element-wise operations including: ReLU, PReLU, precision scaling, batch normalization, bias addition, or other complex non-linear functions, such as a sigmoid or a hyperbolic tangent.

I guess U shall understand SDP's functions first, then U will know the RTL code very well.

bandjshen commented 6 years ago

i know which functions inside SDP , but for detail solution , how to design LUT to implement ELU/Sigmoid/Tanh is not know ; I'm a hw engineer , the code to calculate the table may exist in driver code when this layer try to support Sigmoid for example ; another question is , SDP is 16 pipeline in parrellel , but for EW(the non-linear activation) , it seems only 4 pipeline ; it will hurt the performance .

bandjshen commented 6 years ago

Hi, Junning Do you know in BS/BN instance , there are register config port cfg_alu_algo_rsc_z[1:0] what this bits means ? and how about cfg_alu_shift_value_rsc_z[5:0] ?

Thanks

weikun750 commented 4 years ago

The index is consist of 9 bit INT and 35 bit FRAC, so if the index is inbetween of 2 entrys, the output will be Y0FRAC + Y1(1<<35 - FRAC), that's why, the HW always implement the addr and addr+1 logics.