fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.19k stars 390 forks source link

Overflow error: unsigned conversion from ‘int’ to ‘short unsigned int’ #746

Closed vandenBergArthur closed 1 year ago

vandenBergArthur commented 1 year ago

Hi all,

Before I got stuck with my issue listed at #745, I was having 2 major problems. The models used to test are only the beginning parts of a bigger more complex model. Because instead of compiling & building the large complex model in 1 go, I try to work upward step by step.

a = Input(shape=(10,9,25))

b = Permute((2,3,1))(a)

c = Conv2D(filters=10, kernel_size=1,data_format='channels_last')(b)

model = Model(inputs=a, outputs=c, name='input_permute_conv2d_model')

config = hls4ml.utils.config_from_keras_model(model, granularity='model')

config['Model']['Precision'] = 'ap_fixed<16,6>'
config['Model']['ReuseFactor'] = 5
config['Model']['Strategy'] = 'Resource'

cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType']     = 'io_parallel'
cfg['HLSConfig']  = config
cfg['KerasModel'] = model
cfg['XilinxPart'] = 'xc7z020clg400-1'

hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()

The output of hls_model.build(csim=False) looks like this:

vivado_hls.log

The model is defined as follows:

input_shape_x = (64,9,25)
input_x = Input(shape=input_shape_x, name='input_x')    
a = Permute((2,3,1))(input_x)

self_conv1 = Conv2D(filters=128, kernel_size=1,data_format='channels_last')(a)
self_conv2 = Conv2D(filters=128, kernel_size=1,data_format='channels_last')(a)
self_conv3 = Conv2D(filters=128, kernel_size=1,data_format='channels_last')(a)

b = Concatenate(axis=-2)([self_conv1, self_conv2])
c = Concatenate(axis=-2)([b, self_conv3])

model = Model(inputs=input_x, outputs=c, name='part1_model')

config = hls4ml.utils.config_from_keras_model(model, granularity='model')

config['Model']['Precision'] = 'ap_fixed<16,6>'
config['Model']['ReuseFactor'] = 2
config['Model']['Strategy'] = 'Resource'

cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType']     = 'io_parallel'
cfg['HLSConfig']  = config
cfg['KerasModel'] = model
cfg['XilinxPart'] = 'xc7z020clg400-1'

hls_model = hls4ml.converters.keras_to_hls(cfg)
hls_model.compile()

The code above results in the following error:

firmware/myproject.cpp: In function ‘void myproject(input_t*, result_t*, short unsigned int&, short unsigned int&)’:
firmware/myproject.cpp:38:55: warning: unsigned conversion from ‘int’ to ‘short unsigned int’ changes value from ‘86400’ to ‘20864’ [-Woverflow]
   38 |     const_size_out_1 = OUT_CONCAT_0_10*OUT_CONCAT_1_10*OUT_CONCAT_2_10;

I am running Ubuntu 20.04, installed the conda environment that's listed in the tutorials page and Vivado 2019.2.

Any help would be very appreciated!

Thanks in advance!

vloncar commented 1 year ago

The log of problem 1 says the layer is to big to be unrolled. Use the main branch or wait for the upcoming 0.7.0 release. Convolutional layers with io_parallel in 0.6.0 don't work. For problem 2, even with the latest branch you won't have much success, 128 filters is too much. Generally, keep all activation tensors and weights in the lower end of the order of O(1000) (preferably O(100)) to have any chance of success with io_parallel.

vandenBergArthur commented 1 year ago

The log of problem 1 says the layer is to big to be unrolled. Use the main branch or wait for the upcoming 0.7.0 release. Convolutional layers with io_parallel in 0.6.0 don't work. For problem 2, even with the latest branch you won't have much success, 128 filters is too much. Generally, keep all activation tensors and weights in the lower end of the order of O(1000) (preferably O(100)) to have any chance of success with io_parallel.

Hi @vloncar , first and foremost thanks for your input!

The amount of filters will be scaled down, thanks for pointing that out. I am aware that a Conv2D layer requires io_stream but, compiling the hls model will fail. And I believe it's because of the Permute layer. To be more specific, I think it might be related to issue #712.

To demonstrate, I created a very simple model:

a = Input(shape=(10,9,25))
b = Permute((2,3,1))(a)
c = Conv2D(filters=10, kernel_size=1,data_format='channels_last')(b)

model = Model(inputs=a, outputs=c, name='input_permute_conv2d_model')
model.summary()

config = hls4ml.utils.config_from_keras_model(model, granularity='model')

config['Model']['Precision'] = 'ap_fixed<16,6>'
config['Model']['ReuseFactor'] = 10
config['Model']['Strategy'] = 'Resource'
cfg = hls4ml.converters.create_config(backend='Vivado')
cfg['IOType']     = 'io_stream'
cfg['HLSConfig']  = config
cfg['KerasModel'] = model
cfg['XilinxPart'] = 'xc7z020clg400-1'

hls_model = hls4ml.converters.keras_to_hls(cfg)

When compiling hls_model.compile() the following output is generated:

firmware/myproject.cpp: In function ‘void myproject(hls::stream<nnet::array<ap_fixed<16, 6>, 25> >&, hls::stream<nnet::array<ap_fixed<16, 6>, 10> >&)’:
firmware/myproject.cpp:51:52: error: cannot convert ‘hls::stream<nnet::array<ap_fixed<16, 6>, 25> >’ to ‘nnet::array<ap_fixed<16, 6>, 25>*’
   51 |     nnet::transpose_3d<input_t, layer2_t, config2>(input_1, layer2_out); // permute
      |                                                    ^~~~~~~
      |                                                    |
      |                                                    hls::stream<nnet::array<ap_fixed<16, 6>, 25> >
In file included from firmware/parameters.h:10,
                 from firmware/myproject.cpp:22:
firmware/nnet_utils/nnet_array.h:27:26: note:   initializing argument 1 of ‘void nnet::transpose_3d(data_T*, res_T*) [with data_T = nnet::array<ap_fixed<16, 6>, 25>; res_T = nnet::array<ap_fixed<16, 6>, 10>; CONFIG_T = config2]’
   27 | void transpose_3d(data_T data[CONFIG_T::depth * CONFIG_T::height * CONFIG_T::width],
      |                   ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
firmware/nnet_utils/nnet_array.h: In instantiation of ‘void nnet::transpose_3d(data_T*, res_T*) [with data_T = nnet::array<ap_fixed<16, 6>, 25>; res_T = nnet::array<ap_fixed<16, 6>, 10>; CONFIG_T = config2]’:
firmware/myproject.cpp:51:71:   required from here
firmware/nnet_utils/nnet_array.h:43:92: error: no match for ‘operator=’ (operand types are ‘nnet::array<ap_fixed<16, 6>, 10>’ and ‘nnet::array<ap_fixed<16, 6>, 25>’)
   43 |                 data_t[idx_t[0] * dims_t[1] * dims_t[2] + idx_t[1] * dims_t[2] + idx_t[2]] =
      |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
   44 |                     data[idx[0] * dims[1] * dims[2] + idx[1] * dims[2] + idx[2]];
      |                     ~~~~~                                                                   
In file included from firmware/defines.h:6,
                 from firmware/myproject.h:8,
                 from firmware/myproject.cpp:21:
firmware/nnet_utils/nnet_types.h:21:12: note: candidate: ‘nnet::array<T, N>& nnet::array<T, N>::operator=(const nnet::array<T, N>&) [with T = ap_fixed<16, 6>; unsigned int N = 10]’
   21 |     array &operator=(const array &other) {
      |            ^~~~~~~~
firmware/nnet_utils/nnet_types.h:21:35: note:   no known conversion for argument 1 from ‘nnet::array<ap_fixed<16, 6>, 25>’ to ‘const nnet::array<ap_fixed<16, 6>, 10>&’
   21 |     array &operator=(const array &other) {
      |                      ~~~~~~~~~~~~~^~~~~
g++: error: myproject.o: No such file or directory

Which is exactly the same as in #712.

I look forward to your reply.

vloncar commented 1 year ago

transposing a 3d tensor is not supported. it is an expensive operation and you wouldn't want it anyway. You're transposing the input, think about doing that outside of your model.