autonomio / talos

Hyperparameter Experiments with TensorFlow and Keras
https://autonom.io
MIT License
1.62k stars 268 forks source link

Error when using variable loss weights which are adjusted using a callback #375

Closed patebel closed 5 years ago

patebel commented 5 years ago

I am running a model where I initalize weight parameters for my custom loss function in a static_params dict as follows:

    static_params = {'label_coordinates': label_coordinates,
                     'max_lat': 41.285,
                     'max_lon': -8.5,
                     'min_lat': 41.0,
                     'min_lon': -8.73,
                     'lat_range': 0.285,
                     'lon_range': 0.23,
                     'alpha': K.variable(1),
                     'beta': K.variable(0)
                     }

In order to get this dict to the model using talos i wrote a wrapper which looks like this:


def dest_pred_model_talos(static_params):
    def dest_pred_model(x_train: np.ndarray, y_train1: np.ndarray, x_val: np.ndarray,
                        y_val1: np.ndarray, params: dict):
        model = build_model(params, static_params)
        model = compile_model(model, params, static_params)
        history = fit_model(model, params, x_train, y_train1, x_val, y_val1)

        return history, model

    return dest_pred_model

The weigths are changed using a callback after the first epoch of training. My paramter looks like this:

       hyperparameters = {'max_sequence_length': [50],
                           'embed_length_trip': [2, 4, 8],
                           'depth_lstm': [2, 4, 8],
                           'num_regions': [2048],
                           'num_epochs': [5],
                           'batch_size': [1024],  # only choose multiples of each other
                           'lr': [0.001, 0.0001, 0.1],
                           'optimizer': [optimizers.Adam],
                           'clipvalue': [0.5]
                           }`

I call the Scan method as follows:

     h = ta.Scan(np.vstack(train['INPUT_POLYLINE'].tolist()), np.vstack(train['DESTINATION'].tolist()),
                    params=hyperparameters,
                    x_val=np.vstack(validation['INPUT_POLYLINE'].tolist()),
                    y_val=np.vstack(validation['DESTINATION'].tolist()),
                    # model=multi_gpu(dest_pred_model, gpus=2),
                    model=dest_pred_model_talos(static_params),
                    experiment_name='SingleInputLSTM2L',
                    reduction_interval=5,
                    reduction_method='correlation',
                    reduction_metric='val_loss',
                    minimize_loss=True,
                    round_limit=11)

Starting Talos the first model is trained as assumed, but after building the second model an Error (see Trace arises). Any Ideas where the Bug might be?

Trace: Traceback (most recent call last): File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3296, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "<ipython-input-2-fdcb63e6b932>", line 1, in <module> runfile('C:/Projects/destination-prediction-thesis/src/models/LSTM_reg_min_2_losses.py', wdir='C:/Projects/destination-prediction-thesis/src/models') File "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "C:\Program Files\JetBrains\PyCharm 2019.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Projects/destination-prediction-thesis/src/models/LSTM_reg_min_2_losses.py", line 254, in <module> round_limit=11) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\talos\scan\Scan.py", line 196, in __init__ self._runtime() File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\talos\scan\Scan.py", line 201, in _runtime self = scan_run(self) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\talos\scan\scan_run.py", line 26, in scan_run self = scan_round(self) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\talos\scan\scan_round.py", line 19, in scan_round self.model_history, self.keras_model = ingest_model(self) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\talos\model\ingest_model.py", line 10, in ingest_model self.round_params) File "C:/Projects/destination-prediction-thesis/src/models/LSTM_reg_min_2_losses.py", line 195, in dest_pred_model model = compile_model(model, params, static_params) File "C:/Projects/destination-prediction-thesis/src/models/LSTM_reg_min_2_losses.py", line 102, in compile_model loss_weights={'pred_weighted': static_params['alpha'], 'auxiliary_output': static_params['beta']}) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\keras\engine\training.py", line 347, in compile total_loss = loss_weight * output_loss File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\tensorflow\python\ops\variables.py", line 935, in _run_op return tensor_oper(a.value(), *args, **kwargs) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\tensorflow\python\ops\math_ops.py", line 810, in binary_op_wrapper with ops.name_scope(None, op_name, [x, y]) as name: File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 6083, in __enter__ g = _get_graph_from_inputs(self._values) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 5713, in _get_graph_from_inputs _assert_same_graph(original_graph_element, graph_element) File "C:\Projects\destination-prediction-thesis\venv\lib\site-packages\tensorflow\python\framework\ops.py", line 5649, in _assert_same_graph original_item)) ValueError: Tensor("loss/pred_weighted_loss/Mean_2:0", shape=(), dtype=float32) must be from the same graph as Tensor("Variable/read:0", shape=(), dtype=float32).

mikkokotila commented 5 years ago

Can you try Scan(...clear_session=False...) and see what happens. Or Scan(...clear_tf_session=False...) if you are on an older version of Talos.

patebel commented 5 years ago

Thanks for the fast reply! By doing so i run into following error:

Traceback (most recent call last): File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3296, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 1, in runfile('/home/patebel/projects/DeepDest/src/models/LSTM_reg_min_2_losses.py', wdir='/home/patebel/projects/DeepDest/src/models') File "/home/patebel/.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # execute the script File "/home/patebel/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/patebel/projects/DeepDest/src/models/LSTM_reg_min_2_losses.py", line 254, in clear_session=False) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/talos/scan/Scan.py", line 196, in init self._runtime() File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/talos/scan/Scan.py", line 201, in _runtime self = scan_run(self) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/talos/scan/scan_run.py", line 26, in scan_run self = scan_round(self) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/talos/scan/scan_round.py", line 19, in scan_round self.model_history, self.keras_model = ingest_model(self) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/talos/model/ingest_model.py", line 10, in ingest_model self.round_params) File "/home/patebel/projects/DeepDest/src/models/LSTM_reg_min_2_losses.py", line 195, in dest_pred_model history = fit_model(model, params, x_train, y_train, x_val, y_val) File "/home/patebel/projects/DeepDest/src/models/LSTM_reg_min_2_losses.py", line 151, in fit_model callbacks=[MyCallback(static_params['alpha'], static_params['beta']), early_stopping]) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit validation_steps=validation_steps) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop outs = f(ins_batch) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2715, in call return self._call(inputs) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call fetched = self._callable_fn(*array_vals) File "/home/patebel/venvs/DeepDest/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1458, in call run_metadata_ptr) tensorflow.python.framework.errors_impl.FailedPreconditionError: 2 root error(s) found. (0) Failed precondition: Attempting to use uninitialized value Variable [[{{node Variable/read}}]] [[loss_1/auxiliary_output_loss/Mean_2/_87]] (1) Failed precondition: Attempting to use uninitialized value Variable [[{{node Variable/read}}]] 0 successful operations. 0 derived errors ignored.

patebel commented 5 years ago

UsingScan(...clear_session=False...) and

init = tf.global_variables_initializer()
session.run(init)

in combination with the workaround for solving the problem of serialization, mentioned in https://github.com/keras-team/keras/issues/9444 "solved" the problem for me. Perhaps this might help you tracking down the problem.

mikkokotila commented 5 years ago

Thanks a lot. Where did you include this code snippet?

patebel commented 5 years ago

Right at the beginning before building the model i run this snippet:

        config = tf.ConfigProto()
        config.gpu_options.allow_growth = True
        session = tf.Session(config=config)
        set_session(session)
        init = tf.global_variables_initializer()
        session.run(init)

I only included the first part, where the allow_growth option is set, due to some errorneous CUDA handling I experienced. Thus I thinkk it should be sufficient to use the standard config.

mikkokotila commented 5 years ago

Great. For the sake of completeness, I guess set_session is tf.keras.backend.set_session.

patebel commented 5 years ago

Yes it is!

mikkokotila commented 5 years ago

Because this is an issue with way Tensorflow handles its graph, and not Talos, not much we can do about it. Thanks a lot for providing the complete workaround :)

Closing here.