espressif / esp-dl

Espressif deep-learning library for AIoT applications
MIT License
548 stars 118 forks source link

NaN error #17

Closed caleb221 closed 4 years ago

caleb221 commented 4 years ago

Hello! I would like to know the way numbers are handled in the deep learning softmax function. After performing 3 convolutions I have output numbers for the first 1000 spaces (printed out). The shape of this convolution input is (32,3,3,16). then prelu, and maxpool. The following operation is another convolution of shape (2,1,1,32) to obtain the scores. This output for this matrix is mostly Nan, and then some numbers that make general sense for a score (0-1)...so my question would be where are these nans coming from? Are they expected? Should i write a small function to obtain the scores at each stride instead of all score for all convolutions? Could it be that my Data is not normalized? --> As a side note, I have a hunch that it could be something to do with weight ordering the ESP lib is: (N H W C) caffe's output is: (N C H W)

in-> (N C H W)
I have translated these formats using the method below with numpy first step: np.moveaxis(in,1,3) out-> (N W H C ) second step: np.moveaxis(in, 1,2) out-> ( N H W C) flatten the array using np.reshape(in,-1) load the array into dl_matrix3d_t using a for loop (using out = in.transpose(0,2,3,1) also provides the same result)

as an example (this is from the input to the bounding box layer) original Shape from caffe (4, 32, 1, 1) new shape for ESP lib(4, 1, 1, 32) flatten and load

OUTPUT FROM MODEL: Screenshot from 2019-12-03 18-31-05

here is the code for the neural layers of the network: ( i have erased the free and some printout methods to save space, but will attach the whole code if context is needed)

dl_matrix3d_t *filt_c1 = dl_matrix3d_alloc(10,3,3,3);
dl_matrix3d_t *bias1  =dl_matrix3d_alloc(1,1,1,10);
dl_matrix3d_t *prelu1 =dl_matrix3d_alloc(1,1,1,10);
dl_matrix3d_t *out1   =dl_matrix3d_alloc(10,1,1,10);

getpnet_conv1_0(filt_c1);//pnetVals.h getpnet_conv1_1(bias1);//pnetvals.h getpnet_prelu1_0(prelu1);//pnetvals.h // 3x3 convolution 1 out1= dl_matrix3dff_conv_3x3(in,filt_c1,bias1,1,1,PADDING_SAME);

dl_matrix3d_p_relu(out1, prelu1);
dl_matrix3d_global_pool(out1);

dl_matrix3d_t *filt_c2 = dl_matrix3d_alloc(16,3,3,10);
dl_matrix3d_t *bias2 = dl_matrix3d_alloc(1,1,1,16);
dl_matrix3d_t *out2= dl_matrix3d_alloc(1,1,1,16);
//this might need to be moved until after conv_2
dl_matrix3d_t *prelu2 = dl_matrix3d_alloc(1,1,1,16);

getpnet_conv2_0(filt_c2);
getpnet_conv2_1(bias2);
getpnet_prelu2_0(prelu2);

out2 = dl_matrix3dff_conv_3x3(out1,filt_c2,bias2,1,1,PADDING_SAME);

dl_matrix3d_p_relu(out2,prelu2);

    dl_matrix3d_global_pool(out2);

//=====================================

dl_matrix3d_t *filt_c3 = dl_matrix3d_alloc(32,3,3,16);
    dl_matrix3d_t *bias3 = dl_matrix3d_alloc(1,1,1,32);
    dl_matrix3d_t *out3= dl_matrix3d_alloc(1,1,1,32);

//get weights
    getpnet_conv3_0(filt_c3);
    getpnet_conv3_1(bias3);

    out3 = dl_matrix3dff_conv_3x3(out2,filt_c3,bias3,1,1,PADDING_SAME);
dl_matrix3d_free(out2);
dl_matrix3d_free(bias3);

dl_matrix3d_t *prelu3 = dl_matrix3d_alloc(1,1,1,32);
getpnet_prelu3_0(prelu3);

    dl_matrix3d_p_relu(out3,prelu3);

dl_matrix3d_free(prelu3);
    dl_matrix3d_global_pool(out3);
//=====================================

dl_matrix3d_t *score_filter=dl_matrix3d_alloc(2,1,1,32);
dl_matrix3d_t *score_bias = dl_matrix3d_alloc(1,1,1,2);
dl_matrix3d_t *score_out = dl_matrix3d_alloc(1,1,1,2); 

getpnet_score_0(score_filter);
getpnet_score_1(score_bias);

score_out = dl_matrix3dff_conv_3x3(out3,score_filter,score_bias,1,1,PADDING_SAME);

dl_matrix3d_free(score_filter);
dl_matrix3d_free(score_bias);
dl_matrix3d_softmax(score_out); 

//====================================
dl_matrix3d_t *bbox_filter=dl_matrix3d_alloc(4,1,1,32);
dl_matrix3d_t *bbox_bias=dl_matrix3d_alloc(1,1,1,4);
dl_matrix3d_t *bbox_out=dl_matrix3d_alloc(1,1,1,4);

getpnet_bbox_pred_0(bbox_filter);
getpnet_bbox_pred_1(bbox_bias);

bbox_out = dl_matrix3dff_conv_3x3(out3,bbox_filter,bbox_bias,1,1,PADDING_SAME); dl_matrix3d_free(out3); dl_matrix3d_free(bbox_filter); dl_matrix3d_free(bbox_bias); printf("\n\nALLOCATING OUTPUT...\n\n\n");

//========================================= // SET MEMORY //

//output is a global variable
setOutput(bbox_out,1);// bounding boxes
setOutput(score_out,0);//score output

dl_matrix3d_free(bbox_out);

// dl_matrix3d_free(score_out);

printf("\n\n"); int i; printf("\n\nFROM PNET LOOP:\nscore: "); for (i=0;i<1500;i++) { printf("---:%f :---", score_out->item[i]); } printf("\n\n");

xSemaphoreGive(skeletonKey); vTaskDelay(1000/portTICK_PERIOD_MS); vTaskDelete( NULL );

caleb221 commented 4 years ago

I think my data wasn't being normalized. tried applying this function before sending it into convolutions and I think it fixed the problem

dl_matrix3d_t normalizeData(dl_matrix3du_t in,int w , int h) { dl_matrix3d_t out = dl_matrix3d_alloc(1,w,h,in->c); int n=hw; for (int i=0; i<n;i++) { out->item[i]=(in->item[i]-128)/128; } return out; }

XiaochaoGONG commented 4 years ago

Ahhh, I notice that you are using global_pooling function, which is to squeeze the feature map to 1x1xC dimension. Unfortunately, we have not yet provided that general pooling funtion as we can use stride in convolution to reduce the feature map. We will add that but if you are urgent to use that, I'm afraid you would write the funtion yourself.

XiaochaoGONG commented 4 years ago

BTW, in the code, you don't need to allocate output for each layer since you get the result by calling dl_matrix3dff_conv_3x3, you'll get the output. So just declare the pointer will be fine.

caleb221 commented 4 years ago

Oh, thats good to know. I am in a very time constrained situation so i think i will have to implement it myself. If i understand correctly global pooling is a summation of all values in each channel to output a single kernel for each three channel array? Oh, thanks that will definitely be much easier on the eyes for the future. As a side question, is it valid to split a convolution into two parts and join them back together at the end in order to avoid a watchdog timeout?

caleb221 commented 4 years ago

using something similar to this method https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method

XiaochaoGONG commented 4 years ago

A very simple understanding of global pooling is to perform (1, w, h, c) --> (1, 1, 1, c).

XiaochaoGONG commented 4 years ago

As a side question, is it valid to split a convolution into two parts and join them back together at the end in order to avoid a watchdog timeout?

To avoid long time execution, you can split your kernel into certain parts, i.e. origin kernel: (n, w, h, c), splited to e.g 4, (1st of n/4, w, h, c), (2nd of n/4, w, h, c), etc. You will get the output like (1, w', h', 1st of n/4), (1, w', h', 2nd of n/4), etc. Then concat the output. Or just write your result in certain position of final output, which is (1, w', h', n).

caleb221 commented 4 years ago

as a sanity check, a global pool should take something like the following form, correct? func: maxpool -> = for loops using: (n,w,h,c)

caleb221 commented 4 years ago

sorry github messed up the format Screenshot from 2019-12-05 17-55-54

caleb221 commented 4 years ago

if its unclear as to what i mean here is code for this initial guess and one from an adaptation of caffe's pool.h (actually, both use the same equations, I am mostly asking about the order in which the data should be ordered in is the input matrix (dl_matrix3d_t) and out is defined as dl_matrix3d_alloc(1,1,1,in->c); Initial guess: Screenshot from 2019-12-05 22-06-39

Caffe adaptation:

Screenshot from 2019-12-05 22-02-19

caleb221 commented 4 years ago

this seems to be permuting the whole filter as desired, but Im unsure if its completely correct (for some reason also I had to add abs() around in->stride, w, h (they were negative in the input matrix, not sure if bug or it means that im traversing the matrix upside down)

caffePooling_maybe

XiaochaoGONG commented 4 years ago

Hi @caleb221 , please update this repo, the updated funtion may help you. https://github.com/espressif/esp-face/blob/b94b0ab005c253937001327cceb9d83d41c49a88/lib/include/dl_lib_matrix3d.h#L315-L333

caleb221 commented 4 years ago

Oh great thank you! By the way, does this library have a reshape function?

I have a matrix of shape (1,64,192,4) being returned by the dl_matrix3d_concat_4() function ( i need to load the weights 1/4 at a time to avoid stack overflow/watchdog timers) but i need it in the shape (1,64,768,1) for a fully connected layer

caleb221 commented 4 years ago

an example of working code: (compiles and runs within an o net implementation) (getonet_cutX_fc_0(); is a function that fills the input matrix->items with the desired weights)

    dl_matrix3d_t *fc1 = dl_matrix3d_alloc(1,64,192,1);
    dl_matrix3d_t *fc2 = dl_matrix3d_alloc(1,64,192,1);
    dl_matrix3d_t *fc3 = dl_matrix3d_alloc(1,64,192,1);
    dl_matrix3d_t *fc4 = dl_matrix3d_alloc(1,64,192,1);

    getonet_cut0_fc_0(fc1);
    getonet_cut1_fc_0(fc2);
    getonet_cut2_fc_0(fc3);
    getonet_cut3_fc_0(fc4);

    dl_matrix3d_t *fc_in;//=dl_matrix3d_alloc(1,64,768,1);
    fc_in = dl_matrix3d_concat_4(fc1,fc2,fc3,fc4);

    dl_matrix3d_free(fc1);
    dl_matrix3d_free(fc2);
    dl_matrix3d_free(fc3);
    dl_matrix3d_free(fc4);

out: FC IN -> N: 1 C: 4 W: 64 H: 192 (1,64,192,4) I am going to take a guess and say I can reassign the indices to (1,64,768,1) but I am unsure if the order will stay correct.

XiaochaoGONG commented 4 years ago

Yes, if your data is for fc is 'nhwc', there is no difference with shape between (1, 64, 192, 4) and (1, 64, 768, 1).