autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 268 forks source link

Using multi_gpu on auto-encoder model #300

Closed aaronfderybel closed 5 years ago

aaronfderybel commented 5 years ago

Thanks so much for coming here to raise an issue. Please take a moment to 'check' the below boxes:

Currently facing a weird/critical bug When running the script below and setting hyperpar to:

    hyperpar = {'opt':[adam, SGD],\
              'lr': (10**(-3),1,100),\
              'n_layer':[2],\
              'loss_func':['mse','binary_crossentropy'],
              'scan_number':[number]
              }

I pass the dictionary to the build model function ...

     def build(self, hyperpar):
        amt = self.col_amt
        window_size = self.generator.window_size
        n_layer = hyperpar["n_layer"]
        # input and first layer
        inputs = Input(batch_shape=(None, window_size, amt))
        encoded = LSTM(amt, return_sequences=True, activation='relu')(inputs)

        # stacking encoding layers
        for i in range(1, n_layer):
            encoded = LSTM(amt - i, return_sequences=True, activation='relu')(encoded)
        encoded = LSTM(amt - n_layer, activation='relu')(encoded)

        decoded = RepeatVector(window_size)(encoded)
        # stacking decoding layers
        for i in range(1, n_layer):
            decoded = LSTM(amt - n_layer + i, return_sequences=True, activation='relu')(decoded)
        decoded = LSTM(amt, return_sequences=True, activation='softmax')(decoded)

        model = Model(inputs, decoded)
        #for talos multi-gpu usage
        model = multi_gpu(model, gpus=[0,1])

        model.compile(optimizer= hyperpar['opt'](lr=hyperpar['lr']),\
                      loss=hyperpar['loss_func'])
        model.summary()

        return model

when I use multi_gpu(..) I receive the following output for model.summary(): image

However my model uses LSTM and when removing the multi_gpu(..) I receive following expected output: image

The model trains and gives output. didn't check if it gives exactly same results. In short Should I be worried ? Are these Lambda layers, custom layers created by talos for multi_gpu processing or is some nasty bug going on ?

Thanks in advance, Aaron De Rybel

mikkokotila commented 5 years ago

Can you repeat the same without Talos i.e. just running the model both multi_gpu and no as stand alone Keras model and see what happens.

My guess is that whatever Keras is doing to handle multi_gpu situation, results in the kind of output you are seeing.

aaronfderybel commented 5 years ago

Hello @mikkokotila ,

thank you for your swift response. NOT using the talos.Scan(..) function and directly using build(hyperpar) with the hyperparameters:

params = {'opt': adam, \
          'lr': 10 ** (-3), \
          'n_layer': 2, \
          'loss_func': 'mse'
          }

When using multi_gpu(..) the output remains the same. image

When not using multi_gpu i see expected output. I could also remake build(self, hyperpar) into build(self) but this should not make a difference since i'm sending one set of hyperparameters to the function not using talos scan.

I think this confirms that multi_gpu(..) is responsible for the output. Update I also looked at the loss functions of both putting multi_gpu on/off , keep in mind that they will not produce exactly same results: image

They look as similar as multiple runs without multi_gpu.

@mikkokotila Did you have time to look into this ? It's not a pressing issue if multi_gpu() only produces a wrong model.summary() printout. If this is the case, I suggest the issue is closed

mikkokotila commented 5 years ago

Thanks for the clarification and sorry for delay in replying. As you had suggested, does not seem that anything is wrong with Talos, so closing here.

Thanks again.