CSL-KU / firesim-nvdla

FireSim-NVDLA: NVIDIA Deep Learning Accelerator (NVDLA) Integrated with RISC-V Rocket Chip SoC Running on the Amazon FPGA Cloud
Other
161 stars 31 forks source link

NVDLA driver error undetected in `solo.sh` #17

Open tymcauley opened 4 years ago

tymcauley commented 4 years ago

Hello! First, I'd like to say thank you for putting together a well-documented project that is straightforward to reproduce. I have run the basic example (YOLOv3 inference using solo.sh) several times without issue.

I was trying to run some other networks on the NVDLA using other software (the nvdla_runtime binary from NVIDIA's nvdla/sw repo), and ran into some issues that I'm still trying to debug. After getting those errors, I wanted to check if the NVDLA was still in a known-good state, so I ran the darknet-nvdla/solo.sh workload, and got this result:

# ./solo.sh
[    9.508000] random: crng init done
learning_rate: Using default '0.001000'
momentum: Using default '0.900000'
decay: Using default '0.000100'
policy: Using default 'constant'
max_batches: Using default '0'
layer     filters    size              input                output
    0 offset: Using default '0.000000'
shifter: Using default '0'
post_offset: Using default '0.000000'
post_scale: Using default '1.000000'
outputs 692224 num_out 5537792
    1 odla          tensor 0  416 x 416 x   4   ->    52 x  52 x 256
odla          tensor 1  416 x 416 x   4   ->    26 x  26 x 512
odla          tensor 2  416 x 416 x   4   ->    13 x  13 x 255
odla          tensor 3  416 x 416 x   4   ->    13 x  13 x 256
    2 input layer 1 tensor 3
make_split_layer input layer index 1 tensor 3
split          tensor 3   13 x  13 x 256   ->    13 x  13 x 256
    3 out layer 5 tensor 0
    4 input layer 1 tensor 2
make_split_layer input layer index 1 tensor 2
split          tensor 2   13 x  13 x 255   ->    13 x  13 x 255
    5 post_offset: Using default '0.000000'
outputs 43095 num_out 43264
    6 yolo
    7 input layer 1 tensor 1
make_split_layer input layer index 1 tensor 1
split          tensor 1   26 x  26 x 512   ->    26 x  26 x 512
    8 odla          tensor 0   26 x  26 x 512   ->    26 x  26 x 255
odla          tensor 1   26 x  26 x 512   ->    26 x  26 x 128
    9 input layer 8 tensor 0
make_split_layer input layer index 8 tensor 0
split          tensor 0   26 x  26 x 255   ->    26 x  26 x 255
   10 post_offset: Using default '0.000000'
outputs 172380 num_out 173056
   11 yolo
   12 input layer 8 tensor 1
make_split_layer input layer index 8 tensor 1
split          tensor 1   26 x  26 x 128   ->    26 x  26 x 128
   13 out layer 2 tensor 0
   14 input layer 1 tensor 0
make_split_layer input layer index 1 tensor 0
split          tensor 0   52 x  52 x 256   ->    52 x  52 x 256
   15 odla          tensor 0   52 x  52 x 256   ->    52 x  52 x 255
   16 input layer 15 tensor 0
make_split_layer input layer index 15 tensor 0
split          tensor 0   52 x  52 x 255   ->    52 x  52 x 255
   17 post_offset: Using default '0.000000'
outputs 689520 num_out 692224
   18 yolo
Loading weights from yolov3-odla.cfg...Done!
#### input image size c=4 h=416 w=416
[   10.316000] Task execution failed
NvDlaSubmit: Error IOCTL failed (Cannot allocate memory)
(DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 669)
NVDLA time: 0.000231 seconds
[   10.320000] Task execution failed
NvDlaSubmit: Error IOCTL failed (Cannot allocate memory)
(DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 669)
NVDLA time: 0.000097 seconds
[   10.328000] Task execution failed
NvDlaSubmit: Error IOCTL failed (Cannot allocate memory)
(DLA_RUNTIME) Error 0x0003000f: (propagating from Runtime.cpp, function submitInternal(), line 669)
NVDLA time: 0.000097 seconds
data/person.jpg: Predicted in 0.058764 seconds.
# echo $?
0

You can see that there are several errors from the NVDLA driver, but they aren't caught by the darknet-nvdla software. As a result, an unrealistically fast inference is reported, and we don't get the detections for the horse, dog and person. I wasn't sure if I should file this issue at the darknet-nvdla repo, but it looks like issues are disabled there.

To be clear, I didn't make any modifications to the FireSim NVDLA hardware, I only added files to the workload overlay that's built into the Linux image.

I'm not entirely sure if this is the source of the error, but it looks like the functions in src/odla_layer_impl.cpp (added in this fork) don't do any error checking.

ku-researcher commented 4 years ago

@tymcauley I'm sorry to bother you. I encountered the same error and have trouble debugging. Did you fix the error?