QmppmQ / riscv

15 stars 5 forks source link

Illegal instruction (core 0) at PC 0x00000000 #1

Open Stronger-Huang opened 2 years ago

Stronger-Huang commented 2 years ago

make ./testbench When I finished executing the above command, this problem occurred,

Cycling clock to run for a few instructions
                 6000: Illegal instruction (core 0) at PC 0x00000000:
Halting

VXTQKZEW_TPU1`_N19AY}}0 (6WEXPZOE6RND @5ELS2LSY

It seems to be about instructions, but I haven't modified any files (including example.s and testbench).

I would appreciate it a lot if anyone can help me. Thanks.

QmppmQ commented 2 years ago

The readme has not been updated for a while. I just updated the readme. You can try again with the new readme. This is more complicated than before, but now it is possible to see the characters printed by the processor as it executes the program through the terminal, instead of having to verify the processor core or C code is working properly through the waveform diagram. Of course , if you want to see the waveform diagram, I will also update the steps in the readme later.

Stronger-Huang commented 2 years ago

Thank you! The program has problems: "BSP: illegal instruction exception handler entered", does it because the instruction added violates the basic RVFI instruction?

How can we verify the ten convolution outputs below and more importantly how to prove that the added instructions can indeed accelerate the CNN?

image

And can we see the waveform diagram and hardware resource overhead using Vivado?

Thanks again for your help!

QmppmQ commented 2 years ago

The added instruction does not violate the F extension, the opcode of added instruction is 0x77, and the opcode of F extended is 0x53. You can check the opcode map in riscv-spec. image The reason for the illegal instruction is likely to be that the original rtl of cv32e40p has not been replaced with my modified one, and the original decoder cannot recognize the added instructions. The weights in the LeNet are trained with the fashion-mnist dataset, you can replace the weights or input in lenet.c with what you want. For reference, with the weights and input in the current code, the resulting output should be image We timing the classification task in lenet with traditional convolution and with custom convolution instructions, divide them to get the speedup. The principle of custom instruction acceleration and winograd algorithm refers to this paper: Customized Instruction on RISC-V for Winograd-Based Convolution Acceleration https://ieeexplore.ieee.org/document/9516614 This paper gives the speedup of the custom convolution instruction, as well as the waveform diagram, hardware resource overhead and power consumption using vivado. image image

This modified cv32e40p soft core also successfully runs on vivado, just convert the .hex or .elf file compiled by lenet.c into the corresponding .coe file and import it into the ram in the FPGA. If you use 100MHz clock to run it, it will takes more than 200ms to complete a classification task.

Stronger-Huang commented 2 years ago

Thank you so much, man. The opcode of 'reserved' can also be used for extension. It's working now. But could you please provide the source file of Vivado? I am not good at this software and it seems to have problems converting the .hex to .coe.

QmppmQ commented 2 years ago

I uploaded the files of the vivado project. If you directly use vivado to open the vivado project file *.xpr, you may encounter the following problems:

  1. The version of vivado I used when I built the project is 2020.1. The later version of vivado should be able to be opened successfully, but the previous version may not be able to open it directly.
  2. Most of the rtl files in the vivado project need to be imported again, the path is /FPGA/cv32e40p_rtl_bk_21.11.21.
  3. I used the IP of block memory and clk, which is based on the pynq-z2 board, you need to regenerate the IP according to the FPGA board you are using
Stronger-Huang commented 2 years ago

Thank you, I upgraded two ips(blk_mem_gen_0 and clk_wiz_0). And load the lenet.coe through “Re-Customize IP ”blk_mem_gen_0“.

image

It comes with en error"Validation failed for parameter 'Coe File(Coe_File)' with value '../../../../cv32e40p_21.11.21.srcs/sources_1/lenet.coe' for IP 'blk_mem_gen_0'. The Memory Initialization vector can contain between 1 to Write Depth A number of entires. "

image

Does it because the lines of lenet.coe(33907) be different from the write depth(16)? After I changed the "write depth" of Port A Options, the simulation results remain the same, and y4 to y0 are still "z". If I need to change something other about testbench?

image

How can I solve it and simulate...

QmppmQ commented 2 years ago

You can refer to my setting of the memory 1653706465(1) 1653706476(1) 1653706484(1) 1653706492(1) 1653706500(1) CPU needs enough RAM space to run. And if your clock frequency is 100MHz, the simulation takes more than 200ms to finish running to see the result. When you're running a behavioral simulation, timing violations don't need to be considered. But if you are running a post-synthesis timing simulation or post-implementation timing simulation, you need to make sure there are no timing violations first. You can avoid timing violations by reducing the core frequency.

Stronger-Huang commented 2 years ago

How can we compare the power consumption with and without the conv23 instruction? (to prove that with conv23 we can really save the power) Do we need to modify the lenet.c(replace the conv23 instruction with basic instruction) and convert it to .coe file and then load it in the ram? By the way, what's the purpose of the file lenet_r.coe?

image

QmppmQ commented 2 years ago

I guess what you want to know is how to compare the power consumption with and without the convolution module. Just instantiate convolution module mac_ops in ex_stage or not, and run implementation to see the power report in Vivado. Of course, the RI5CY with convolution module should consume more power than original core. The answer of second question is yes, what a coincidence, that why lenet_r.coe is here.

Stronger-Huang commented 2 years ago

Thank you, I converted the input matrix in Lenet into an image, as shown below, which is a pair of sneakers, right?

colorimage

How can we test the output of lenet to get the accuracy of training data?

image

QmppmQ commented 2 years ago

We train the Lenet by pytorch and put the trained weights into lenet.c. So accuracy is obtained by pytorch. The weights in lenet.c here have an accuracy of 87.8% for the Fusion-MNIST dataset.