Different behavior between CPU and GPU

giobus75 commented 3 years ago

CC: @simleo

Hi, Trying to update the network parameters by using the layer update_weights method, I came across some problems so I tried to check if everything was fine by using a code able to set to zero or one weights (and/or bias) ad feeding the network with an ones-filled tensor (shape 3, 256, 256). Setting both weights and bias to zero and running on the CPU I got what expected: every layer outputs were 0.0 and the output of the softmax was 0.5. The weird behavior happened as soon as I switched to the GPU. I got something like this:

layer name        min_out_value   max_out_value
input1             1.000000e+00    1.000000e+00
conv1              0.000000e+00    0.000000e+00
relu1              0.000000e+00    0.000000e+00
conv2              0.000000e+00    0.000000e+00
relu2              0.000000e+00    0.000000e+00
maxpool2           0.000000e+00    0.000000e+00
conv3              0.000000e+00    0.000000e+00
relu3              0.000000e+00    0.000000e+00
conv4              0.000000e+00    0.000000e+00
relu4              0.000000e+00    0.000000e+00
maxpool4           0.000000e+00    0.000000e+00
conv5              0.000000e+00    0.000000e+00
relu5              0.000000e+00    0.000000e+00
conv6              0.000000e+00    0.000000e+00
relu6              0.000000e+00    0.000000e+00
conv7              0.000000e+00    0.000000e+00
relu7              0.000000e+00    0.000000e+00
maxpool6           0.000000e+00    0.000000e+00
conv8              0.000000e+00    0.000000e+00
relu8              0.000000e+00    0.000000e+00
conv9              0.000000e+00    0.000000e+00
relu9              0.000000e+00    0.000000e+00
conv10             0.000000e+00    0.000000e+00
relu10             0.000000e+00    0.000000e+00
maxpool8           0.000000e+00    7.932756e+34
conv11             0.000000e+00    7.932756e+34
relu11             0.000000e+00    7.932756e+34
conv12             0.000000e+00    7.932756e+34
relu12             0.000000e+00    7.932756e+34
conv13             0.000000e+00    7.932756e+34
relu13             0.000000e+00    7.932756e+34
maxpool10          0.000000e+00    7.932756e+34
reshape1           0.000000e+00    7.932756e+34
dense1             9.360908e-33    7.932756e+34
relu14             9.360908e-33    7.932756e+34
dense2             4.078427e+12    1.090233e+27
softmax15          1.134006e-01    8.865993e-01

The same output of GPU persists setting only bias to one. On the contrary, by using the CPU I still got what expected (intermediate layer outputs equal to 1.0 and 0.5 for the softmax output ).

Is that something I have to worry about? Thank you Giovanni

This is the code I used:

import pyecvl.ecvl as ecvl
import pyeddl.eddl as eddl
from pyeddl.tensor import Tensor
import numpy as np

def VGG16_promort(in_layer, num_classes, seed=1234, init=eddl.HeNormal):
    x = in_layer
    x = eddl.ReLu(init(eddl.Conv(x, 64, [3, 3]), seed))
    x = eddl.MaxPool(eddl.ReLu(init(eddl.Conv(x, 64, [3, 3]), seed)), [2, 2], [2, 2])
    x = eddl.ReLu(init(eddl.Conv(x, 128, [3, 3]), seed))
    x = eddl.MaxPool(eddl.ReLu(init(eddl.Conv(x, 128, [3, 3]), seed)), [2, 2], [2, 2])
    x = eddl.ReLu(init(eddl.Conv(x, 256, [3, 3]), seed))
    x = eddl.ReLu(init(eddl.Conv(x, 256, [3, 3]), seed))
    x = eddl.MaxPool(eddl.ReLu(init(eddl.Conv(x, 256, [3, 3]), seed)), [2, 2], [2, 2])
    x = eddl.ReLu(init(eddl.Conv(x, 512, [3, 3]), seed))
    x = eddl.ReLu(init(eddl.Conv(x, 512, [3, 3]), seed))
    x = eddl.MaxPool(eddl.ReLu(init(eddl.Conv(x, 512, [3, 3]), seed)), [2, 2], [2, 2])
    x = eddl.ReLu(init(eddl.Conv(x, 512, [3, 3]), seed))
    x = eddl.ReLu(init(eddl.Conv(x, 512, [3, 3]), seed))
    x = eddl.MaxPool(eddl.ReLu(init(eddl.Conv(x, 512, [3, 3]), seed)), [2, 2], [2, 2])
    x = eddl.Reshape(x, [-1])
    x = eddl.ReLu(init(eddl.Dense(x, 256), seed))
    x = eddl.Softmax(eddl.Dense(x, num_classes))
    return x

def get_net(in_size=[256,256], num_classes=2, lr=1e-5, gpu=True):  
    ## Network definition
    in_ = eddl.Input([3, in_size[0], in_size[1]])
    out = VGG16_promort(in_, num_classes)
    net = eddl.Model([in_], [out])
    eddl.build(
        net,
        eddl.rmsprop(lr),
        ["soft_cross_entropy"],
        ["categorical_accuracy"],
        eddl.CS_GPU([1], mem="low_mem") if gpu else eddl.CS_CPU()
        )

    eddl.summary(net)
    eddl.setlogfile(net, "promort_VGG16_classification")
    return net

def reset_eddl_net_params(net, weight='zeros', bias='zeros'):
    layers = net.layers
    for l in layers:
        name = l.name
        params = l.params

        w_np = None
        b_np = None

        for index, p in enumerate(params):
            if index == 0:
                if weight == 'zeros':
                    w_np = np.zeros_like(p.getdata())
                else:
                    w_np = np.ones_like(p.getdata())

            if index == 1: 
                if bias == 'zeros':
                    b_np = np.zeros_like(p.getdata())
                else:
                    b_np = np.ones_like(p.getdata())

        w_np_t = Tensor.fromarray(w_np)
        b_np_t = Tensor.fromarray(b_np)

        # Update of the parameters
        l.update_weights(w_np_t, b_np_t)  

### Get Network
net = get_net(in_size=[256, 256], num_classes=2, gpu=False)   #### SET gpu=True to use GPU, False to use CPU

## Reset weights
reset_eddl_net_params(net, weight='zeros', bias='zeros')

## Predict a fake image (all ones image)
in_np = np.ones((1,3,256,256)) # Input. A single white image. Channel first
in_T = Tensor.fromarray(in_np)

result = eddl.predict(net, [in_T])

## Print layer outputs
print ('{0: <15} {1: >15} {2:>15}'.format('layer name', 'min_out_value', 'max_out_value'))
for l in net.layers:
    print ('{0: <15} {1:15e} {2:15e}'.format(l.name, np.min(l.output.getdata()), np.max(l.output.getdata())))

RParedesPalacios commented 3 years ago

When you use a CS_GPU then you have two networks one in CPU and a clone in GPU. EDDL internally keeps this clone by copying the weights from GPU (where training is performed) back to CPU and so on... When you put zeros to some weight on CPU then you only put in CPU but not in GPU. Then the "update_weights" are only updating weights in the CPU side. This is part of the internals of EDDL and i see that it could lead to misunderstanding.

In any case i do not know if there is any reason to do this. And just in case that exists a need please open and issue with the particular functionality you require and we will provide an implementation.

In case that you want to use these low-level methods (perfect) you can. In fact this clone network in GPU is in the following net member:

net.snets[0] // is the GPU network and snets[1] the second if you use more than one GPU etc...

layers = net.snets[0].layers

etc...

giobus75 commented 3 years ago

Thank you for the quick reply @RParedesPalacios. I have to admit that I'm a bit confused, sorry :-). The problem is: If I'm working with the GPU and I want to set parameters by using some initialization tensors (e.g. the ImageNet ones) only on a subset of the network (for example the convolutional one), is it mandatory to use the low-level methods? I mean, is the high-level API update_weighs method useless if CS_GPU is set? Thank you

RParedesPalacios commented 3 years ago

you can use update_weights as you are doing but then we would need an extra function to sync_weights with the device (GPU/FPGA). Then I am going to provide such functionality asap.

RParedesPalacios commented 3 years ago

Rigtht now you can do:

l.update_weights(w_np_t, b_np_t)
distributeTensor(l,"param",0); //weight distributeTensor(l,"param",1); //bias

try it and let me know

Edit: Not sure if this functionality is available in python.

RParedesPalacios commented 3 years ago

Also as a shortcut you can use:

distributeParams(l);

but it is in develop branch

giobus75 commented 3 years ago

Hi @RParedesPalacios, I tried the workaround by using distributeParams(l) and it works fine both in C++ and Python. However, to check the outputs of the layers I had to access the net->snets[0]->layers, maybe there is an easier way, I don't know. By the way, during my C++ tests, I found out that if I call the API function getParams(layer, p) using a layer without parameters (like the Input or maxpool ones), I get a segfault. I think it would be nice to get something like a null value to manage the exception and prevent the program exit in that way. What do you think about it? Thank you again. Giovanni

RParedesPalacios commented 3 years ago

Hi @RParedesPalacios, I tried the workaround by using distributeParams(l) and it works fine both in C++ and Python. However, to check the outputs of the layers I had to access the net->snets[0]->layers, maybe there is an easier way, I don't know.

Tensor* getOutput(layer l1);

this function "collect" the tensor from devices. see for instance this example:

https://github.com/deephealthproject/eddl/blob/d97875b3161d3e7a9199d35079d57a7d4ce3c6fa/examples/nn/3_drive/1_drive_seg.cpp#L185

By the way, during my C++ tests, I found out that if I call the API function getParams(layer, p) using a layer without parameters (like the Input or maxpool ones), I get a segfault. I think it would be nice to get something like a null value to manage the exception and prevent the program exit in that way. What do you think about it?

well since this function requires a parameter it is assumed that the user knows the range of that parameter.

with vector<Tensor*> getParams(layer l1); (develop)

the user only has to provide a layer and the function returns a vector (could be void) with all the params

Thank you again. Giovanni

deephealthproject / eddl

Different behavior between CPU and GPU #240