autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 270 forks source link

Getting errors when handling custom object #580

Closed cedarsnow closed 2 years ago

cedarsnow commented 2 years ago

Thank you very much for reporting a bug on Talos. Before you do, please go through the below checklist carefully and make sure to prepare your bug report in a way that facilitates effective handling of the matter.

1) Confirm the below

2) Include the output of:

talos.__version__ 1.0.2

3) Explain clearly what you expect to happen

I'm trying to use the model from https://github.com/GregMurray30/MultiTouchAttribution with Talos.

I expect talos.Scan() to work with my model and talos.Predict() can return me the prediction using Colab.

The model I'm using needs a custom object, here is how it is defined:

      class MyAttention(tf.keras.Model):
            def __init__(self, units,**kwargs):
                super(MyAttention, self).__init__(**kwargs)
                self.units=units
                self.w=self.add_weight(name='w',shape=[lstm_size, self.units], initializer='normal')
                self.b=self.add_weight(name='b',shape=[3, self.units], initializer='zeros')

            def call(self, x, return_sequences=True):

                e1 = K.relu(K.dot(x,self.w)+self.b)

                #print(e1)
                #e2 = K.relu(K.dot(e1,self.w2)+self.b2)
                #print(e2)
                a = K.softmax(e1, axis=1)

                #a = tf.keras.activations.softmax(e1, axis=1)

                output = a*x
                return e1, K.sum(output, axis=1)

            def get_config(self):

                return {
                    'units':self.units,
                    'w':self.w.numpy(),
                    'b':self.b.numpy(),
                }

            @classmethod
            def from_config(cls, config):
                return cls(**config)

Then I re-built the model following Talos' guide:

    def dnamta(x_train,y_train, x_val, y_val, params):

        inp_cat_data = Input(shape=[num_time_steps, 1]) #input for categorical data
        inp_num_data = Input(shape=[num_time_steps, num_feats]) #input for numerical data

        emb = Embedding(cardinality, params['embedding_size'], input_length=(num_time_steps))(inp_cat_data) #Embedding layer for categorical data (channels)

        resh1 = Reshape((num_time_steps, params['embedding_size']))(emb) #reshape to squash the 4D tensor back into 3D, not sure why emb layer does this

        conc = Concatenate()([resh1, inp_num_data]) #combine numerical data with channel embedding tensor before inputting into LSTM

        resh2 = Reshape((num_time_steps, params['embedding_size']+num_feats))(conc) #reshape to fit LSTM needs just in case
        # LSTM with channel embeddings and control features, returns hidden state sequences for the attention layer to use
        lstm_layer1 = LSTM(params['lstm_size'], dropout=params['dropout'], recurrent_dropout=params['recurrent_dropout'],return_sequences=True)(resh2)
        lstm_layer2 = LSTM(params['lstm_size'], dropout=params['dropout'], recurrent_dropout=params['recurrent_dropout'],return_sequences=True)(lstm_layer1)
        lstm_layer3 = LSTM(params['lstm_size'], dropout=params['dropout'], recurrent_dropout=params['recurrent_dropout'],return_sequences=True)(lstm_layer2)
        lstm_layer4 = LSTM(params['lstm_size'], dropout=params['dropout'], recurrent_dropout=params['recurrent_dropout'],return_sequences=True)(lstm_layer3)
        lstm_layer5 = LSTM(params['lstm_size'], dropout=params['dropout'], recurrent_dropout=params['recurrent_dropout'],return_sequences=True)(lstm_layer4)
        lstm_layer6 = LSTM(params['lstm_size'], dropout=params['dropout'], recurrent_dropout=params['recurrent_dropout'],return_sequences=True)(lstm_layer5)
        #lstm_layer7 = LSTM(lstm_size, dropout=.01, recurrent_dropout=.01,return_sequences=True)(lstm_layer6)

        #lstm_layer = Bidirectional(LSTM(128, dropout=.01, recurrent_dropout=.01,return_sequences=True))(resh2)

        #Attention layer: multi-layer perceptron with 1 neuron, returns unscaled attention weight vector and attention weighted sum
        a_unsc, attention_layer = MyAttention(1)(lstm_layer6)
        #a_unsc, attention_layer = Attention(lstm_size=params['lstm_size'],units=1,initializer=params['initializer'])(lstm_layer6)

        #Output layer dense, binary outcome
        out = Dense(1, activation='sigmoid')(attention_layer)

        #output the prediction and the unscaled attention to retrieve after training 
        model = tf.keras.Model(inputs=[inp_cat_data, inp_num_data], outputs=out )

        #output the prediction only - attention to retrieve after training with intermediate layer extraction
        #model = keras.Model(inputs=[inp_cat_data, inp_num_data], outputs=out )

        #model = keras.Model(inputs=[inp_cat_data, inp_num_data], outputs=out )

        model.compile(optimizer=params['optimizer'],
                      loss=params['loss'],
                      metrics=['acc'])

        output=model.fit(x=x_train, y=y_train, validation_data=[x_val, y_val], epochs=params['epochs'], batch_size=params['batch_size'],callbacks=[talos.utils.ExperimentLogCallback('dnamta', params)])

        return output,model

After this, I defined the param dictionary and called Scan() and Predict().

4) Explain what actually happened

With the code above, talos.Predict() returned me an error:

    ValueError                                Traceback (most recent call last)

    [<ipython-input-35-53d0e1518611>](https://localhost:8080/#) in <module>()
          1 p = talos.Predict(scan_object)
    ----> 2 p.predict_classes(x=[X_test_cat2, X_test_num2],metric='acc',asc=False)

    10 frames

    [/usr/local/lib/python3.7/dist-packages/keras/utils/generic_utils.py](https://localhost:8080/#) in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name)
        561   if cls is None:
        562     raise ValueError(
    --> 563         f'Unknown {printable_module_name}: {class_name}. Please ensure this '
        564         'object is passed to the `custom_objects` argument. See '
        565         'https://www.tensorflow.org/guide/keras/save_and_serialize'

    ValueError: Unknown layer: MyAttention. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details.

I've tried to use register the custom object and use

'with tf.keras.utils.custom_object_scope(custom_objects)'

but got another error:

        TypeError                                 Traceback (most recent call last)

        [<ipython-input-34-30812c504abf>](https://localhost:8080/#) in <module>()
              3 with tf.keras.utils.custom_object_scope(custom_objects):
              4     p = talos.Predict(scan_object)
        ----> 5     p.predict_classes(x=[X_test_cat2, X_test_num2],metric='acc',asc=False)
              6     print(p)

        14 frames

        [/usr/local/lib/python3.7/dist-packages/keras/utils/generic_utils.py](https://localhost:8080/#) in validate_kwargs(kwargs, allowed_kwargs, error_message)
           1172   for kwarg in kwargs:
           1173     if kwarg not in allowed_kwargs:
        -> 1174       raise TypeError(error_message, kwarg)
           1175 
           1176 

        TypeError: ('Keyword argument not understood:', 'w')

Another error related: if I define the get_config() method in another way in MyAttention:

              def get_config(self):
                    config=super().get_config()
                    config.update({
                        'units':self.units,
                        'w':self.w.numpy(),
                        'b':self.b.numpy(),
                    })

                    return config

Talos.Scan() cannot complete the scaning prcess and shows:

        NotImplementedError                       Traceback (most recent call last)

        [<ipython-input-76-e32780f89adc>](https://localhost:8080/#) in <module>()
              2                          params=params,
              3                          model=dnamta,
        ----> 4                          experiment_name='dnamta')

        10 frames

        [/usr/local/lib/python3.7/dist-packages/talos/scan/Scan.py](https://localhost:8080/#) in __init__(self, x, y, params, model, experiment_name, x_val, y_val, val_split, random_method, seed, performance_target, fraction_limit, round_limit, time_limit, boolean_limit, reduction_method, reduction_interval, reduction_window, reduction_threshold, reduction_metric, minimize_loss, disable_progress_bar, print_params, clear_session, save_weights)
            194         # start runtime
            195         from .scan_run import scan_run
        --> 196         scan_run(self)

        [/usr/local/lib/python3.7/dist-packages/talos/scan/scan_run.py](https://localhost:8080/#) in scan_run(self)
             24         # otherwise proceed with next permutation
             25         from .scan_round import scan_round
        ---> 26         self = scan_round(self)
             27         self.pbar.update(1)
             28 

        [/usr/local/lib/python3.7/dist-packages/talos/scan/scan_round.py](https://localhost:8080/#) in scan_round(self)
             30     try:
             31         # save model and weights
        ---> 32         self.saved_models.append(self.round_model.to_json())
             33 
             34         if self.save_weights:

        [/usr/local/lib/python3.7/dist-packages/keras/engine/training.py](https://localhost:8080/#) in to_json(self, **kwargs)
           2660         A JSON string.
           2661     """
        -> 2662     model_config = self._updated_config()
           2663     return json.dumps(
           2664         model_config, default=json_utils.get_json_type, **kwargs)

        [/usr/local/lib/python3.7/dist-packages/keras/engine/training.py](https://localhost:8080/#) in _updated_config(self)
           2618     from keras import __version__ as keras_version  # pylint: disable=g-import-not-at-top
           2619 
        -> 2620     config = self.get_config()
           2621     model_config = {
           2622         'class_name': self.__class__.__name__,

        [/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py](https://localhost:8080/#) in get_config(self)
            683 
            684   def get_config(self):
        --> 685     return copy.deepcopy(get_network_config(self))
            686 
            687   @classmethod

        [/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py](https://localhost:8080/#) in get_network_config(network, serialize_layer_fn)
           1408           filtered_inbound_nodes.append(node_data)
           1409 
        -> 1410       layer_config = serialize_layer_fn(layer)
           1411       layer_config['name'] = layer.name
           1412       layer_config['inbound_nodes'] = filtered_inbound_nodes

        [/usr/local/lib/python3.7/dist-packages/keras/utils/generic_utils.py](https://localhost:8080/#) in serialize_keras_object(instance)
            509         return serialize_keras_class_and_config(
            510             name, {_LAYER_UNDEFINED_CONFIG_KEY: True})
        --> 511       raise e
            512     serialization_config = {}
            513     for key, item in config.items():

        [/usr/local/lib/python3.7/dist-packages/keras/utils/generic_utils.py](https://localhost:8080/#) in serialize_keras_object(instance)
            504     name = get_registered_name(instance.__class__)
            505     try:
        --> 506       config = instance.get_config()
            507     except NotImplementedError as e:
            508       if _SKIP_FAILED_SERIALIZATION:

        [<ipython-input-74-91cfecb2d906>](https://localhost:8080/#) in get_config(self)
             24 
             25     def get_config(self):
        ---> 26         config=super().get_config()
             27         config.update({
             28             'units':self.units,

        [/usr/local/lib/python3.7/dist-packages/keras/engine/training.py](https://localhost:8080/#) in get_config(self)
           2628 
           2629   def get_config(self):
        -> 2630     raise NotImplementedError
           2631 
           2632   @classmethod

        NotImplementedError:

If you want to run the code, you have the notebook here. https://colab.research.google.com/drive/13J0aLZwQ6cbT9ObwqMNa5Q4cb8OBGifP?usp=sharing


mikkokotila commented 2 years ago

Are you able to run the input model by itself outside of Talos? Basically, just replace all the references to params with the actual parameter values, and then try running it directly in Tensorflow and see what happens. Do you get the same error?

mikkokotila commented 2 years ago

Here are some related issues that might be useful:

Here is also an example notebook for recovering best model when other methods fail.

cedarsnow commented 2 years ago

Are you able to run the input model by itself outside of Talos? Basically, just replace all the references to params with the actual parameter values, and then try running it directly in Tensorflow and see what happens. Do you get the same error?

Running, saving and loading the model with tf.keras are working.

mikkokotila commented 2 years ago

This is now handled in #581 for the main part, and the Evaluate() part will be handled in #582. Closing here.