JunningWu / AIChip

Aiming at an AI Chip based on RISC-V and NVDLA.
Apache License 2.0
21 stars 4 forks source link

Learning NVDLA SDP HW code #11

Open JunningWu opened 6 years ago

JunningWu commented 6 years ago

NV_NVDLA_SDP_CMUX_pipe_p1

模块的时钟域与nvdla_core_clk一样; 输入信号为 cacc2sdp_pd---------(514位宽的数据线,包括512位数据,以及pd_batch_end, pd_layer_end的标志位) cacc2sdp_valid cacc_rdy 输出信号为 cacc2sdp_ready cacc_vld cacc_pd

通常情况下(存在宏定义SYNTHESIS的时候),经过pipe_p1以后的信号,包括valid和pd信号,直接赋值给输出信号;否则延迟一个周期。 在pipe_p1的内部,有一个randomizer模块,如下图所示,会产生随机延迟周期,将信号进行延迟。此时,valid信号为0,pd信号则为X态。

image

在模块的开头,有一条注释,模块应该是生成的,目前还没找到生成文件和说明文档。

Generated by ::pipe -m -bc -os cacc_pd (cacc_vld, cacc_rdy) <= cacc2sdp_pd[513:0] (cacc2sdp_valid, cacc2sdp_ready)

JunningWu commented 6 years ago
JunningWu commented 6 years ago

NV_NVDLA_SDP_CMUX_pipe_p2

与pipe_p1类似

模块的时钟域与nvdla_core_clk一样; 输入信号为 cmux2dp_pd------512bits cmux2dp_pvld sdp_cmux2dp_ready 输出信号为 cmux2dp_prdy sdp_cmux2dp_valid sdp_cmux2dp_pd

通常情况下(存在宏定义SYNTHESIS的时候),经过pipe_p2以后的信号,包括valid和pd信号,直接赋值给输出信号;否则延迟一个周期。

类似于pipe_p1,pipe_p2也存在一个randomizer,工作过程类似。

JunningWu commented 6 years ago
JunningWu commented 6 years ago

SDP RDMA的配置寄存器与功能部件相类似,包括4大类的数据,MUX进来的数据,以及BS/BN/EW的偏置数据。

每一组配置寄存器,分别包括宽/高/通道数,以及地址和line/surface步进。

数据

D_DATA_CUBE_WIDTH | 0xa00c | Input cube’s width D_DATA_CUBE_HEIGHT | 0xa010 | Input cube’s height D_DATA_CUBE_CHANNEL | 0xa014 | Input cube’s channel D_SRC_BASE_ADDR_LOW | 0xa018 | Lower 32bits of input data address D_SRC_BASE_ADDR_HIGH | 0xa01c | Higher 32bits of input data address when axi araddr is 64bits D_SRC_LINE_STRIDE | 0xa020 | Line stride of input cube D_SRC_SURFACE_STRIDE | 0xa024 | Surface stride of input cube

BS

D_BRDMA_CFG | 0xa028 | Configuration of BRDMA: enable/disable, data size, Ram type, etc. D_BS_BASE_ADDR_LOW | 0xa02c | Lower 32bits address of the bias data cube D_BS_BASE_ADDR_HIGH | 0xa030 | Higher 32bits address of the bias data cube when axi araddr is 64bits D_BS_LINE_STRIDE | 0xa034 | Line stride of bias data cube D_BS_SURFACE_STRIDE | 0xa038 | Surface stride of bias data cube D_BS_BATCH_STRIDE | 0xa03c | Stride of bias data cube in batch mode

BN

D_NRDMA_CFG | 0xa040 | Configuration of NRDMA: enable/disable, data size, Ram type, etc. D_BN_BASE_ADDR_LOW | 0xa044 | Lower 32bits address of the bias data cube D_BN_BASE_ADDR_HIGH | 0xa048 | Higher 32bits address of the bias data cube when axi araddr is 64bits D_BN_LINE_STRIDE | 0xa04c | Line stride of bias data cube D_BN_SURFACE_STRIDE | 0xa050 | Surface stride of bias data cube D_BN_BATCH_STRIDE | 0xa054 | Stride of bias data cube in batch mode

EW

D_ERDMA_CFG | 0xa058 | Configuration of ERDMA: enable/disable, data size, Ram type, etc. D_EW_BASE_ADDR_LOW | 0xa05c | Lower 32bits address of the bias data cube D_EW_BASE_ADDR_HIGH | 0xa060 | Higher 32bits address of the bias data cube when axi araddr is 64bits D_EW_LINE_STRIDE | 0xa064 | Line stride of bias data cube D_EW_SURFACE_STRIDE | 0xa068 | Surface stride of bias data cube D_EW_BATCH_STRIDE | 0xa06c | Stride of bias data cube in batch mode

其他信息 配置SPD的工作模式,以及NAN是否替换成0等。 D_FEATURE_MODE_CFG | 0xa070 | Operation configuration: flying mode, output destination, Direct or Winograd mode, flush NaN to zero, batch number. D_SRC_DMA_CFG | 0xa074 | RAM type of input data cube D_STATUS_NAN_INPUT_NUM | 0xa078 | Input NaN element number D_STATUS_INF_INPUT_NUM | 0xa07c | Input Infinity element number D_PERF_ENABLE | 0xa080 | Enable/Disable performance counting

JunningWu commented 6 years ago

SDP_D_DP_BS_CFG 0xb058 -------> 0x2c16 image

JunningWu commented 6 years ago

image

总结一下BS/BN相关的配置寄存器以及各个比特的含义,对于ALU_ALGO部分的含义,还不是很明白,希望在后面会有补充。 对于示例程序cc_alexnet_conv5_relu5_int16_dtest_cvsram,BS_CFG=0x12,BS_ALU_CFG=0x1501,从仿真结果可以看出,BS的处理,仅将输入数据进行了右移0x15位,即右移21位,例如0x686c1308------>>>>0x00000343,且进行ReLU处理。

JunningWu commented 6 years ago

image 更新EW的配置寄存器,示例程序cc_alexnet_conv5_relu5_int16_dtest_cvsram尽管配置了EW配置寄存器的值,但是其选择了将EW功能Bypass,因此BN的输出直接送给C模块,可参见 #10

JunningWu commented 6 years ago

SDP_C,完成数据的压缩处理,将32位的数据,压缩成16比特输出。

JunningWu commented 6 years ago

image

CrazyBingo commented 6 years ago

32赞 你研究的比较深入! come on

JunningWu commented 6 years ago

@CrazyBingo we can do it together

CrazyBingo commented 6 years ago

@JunningWu Happy new year, and happy to research NVDLA with you. I have meet noboady that research NVDLA, TKS to meet you. I am now work in Shenzhen, and research AI Architecture lately. Here is my prosonal email: crazyfpga@qq.com. and my wchat is : hanbinhdu Look forward to your reply.

jiaobuzuji commented 6 years ago

@CrazyBingo haha

CrazyBingo commented 6 years ago

Ha what

redpanda3 commented 6 years ago

sdp有点难的,我也没太看懂。谢谢资料。

redpanda3 commented 6 years ago

”在模块的开头,有一条注释,模块应该是生成的,目前还没找到生成文件和说明文档。

Generated by ::pipe -m -bc -os cacc_pd (cacc_vld, cacc_rdy) <= cacc2sdp_pd[513:0] (cacc2sdp_valid, cacc2sdp_ready)“

这个在plugin的文件夹里有一个perl的文件,pipe是其中一个脚本。对应的,还有flop,retime,这些。