cornell-zhang / heterocl

HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
https://cornell-zhang.github.io/heterocl/
Apache License 2.0
322 stars 92 forks source link

Inconsistent results in HCL UltraNet implementation with customizations #440

Open sjz38 opened 2 years ago

sjz38 commented 2 years ago

Description and Example

In the HCL UltraNet implementation https://github.com/sjz38/my_ultranet, the bounding box outputs on an image are not consistent among runs if some HCL code is defined in a certain way. The specific HCL code is found in the _pad and relu functions in ultranet_functions.py which have commented/uncommented sections labeled "CPU Backend" and "HLS Backend" (see end of issue post for this code). Niansong added the "HLS Backend" sections to make it easier for the customizations to be added, but I noticed that the bounding box outputs (Output BBox) would vary among runs. When I use the "CPU Backend" lines, I get the expected result each time.

(unet_env) sjz38@zhang-x1:~/test/my_ultranet$ python3 main_single_input.py 
Weights loaded from ultranet_4w4a.pt
Output BBox:  [[195, 269, 175, 316]]

(unet_env) sjz38@zhang-x1:~/test/my_ultranet$ python3 main_single_input.py 
Weights loaded from ultranet_4w4a.pt
Output BBox:  [[195, 269, 175, 316]]

However, when I switch to the "HLS Backend" lines, I get different unexpected results, even across runs.

(unet_env) sjz38@zhang-x1:~/test/my_ultranet$ python3 main_single_input.py 
Weights loaded from ultranet_4w4a.pt
Output BBox:  [[341, 403, 229, 372]]

(unet_env) sjz38@zhang-x1:~/test/my_ultranet$ python3 main_single_input.py 
Weights loaded from ultranet_4w4a.pt
Output BBox:  [[613, 653, 278, 388]]

To recreate the problem

  1. Clone the ultranet repo https://github.com/sjz38/my_ultranet
  2. git checkout f1faf87 if not already on this commit
  3. Run python3 main_single_input.py twice, compare Output BBox results. They should be the same since the HLS Backend code is toggled.
  4. To change to the CPU Backend code switch the commented out sections in ultranet_functions.py on lines 57 and 108
56   # Use this for CPU backend
57   # return hcl.compute(out_shape, _pad, name=name)
58   # Use this for HLS backend
59   return hcl.compute(out_shape, _pad, dtype=data.dtype, name=name)
106   def relu(data, name='relu'):
107       # CPU Backend
108       # x1 = hcl.compute(data.shape, lambda *y: hcl.select(data[y] < 0, hcl.cast(data.dtype, 0), data[y]), name=name+'_x1')
109       # x2 = hcl.compute(x1.shape, lambda *y: hcl.select(x1[y] > 1, hcl.cast(data.dtype, 1), x1[y]), name=name)
110       # return x2
111       # HLS Backend
112       return hcl.compute(data.shape, lambda *y: 
113           hcl.select(data[y] < 0, 
114                 hcl.cast(data.dtype, 0), 
115                 hcl.select(data[y] > 1, hcl.cast(data.dtype, 1), data[y])),
116                 name=name)

It seems that something is broken in the LLVM backend when these customizations are applied.