Support for recurrent neural network training data

adamconkey commented 5 years ago

It seems that the Scan class assumes the input is two dimensional (m,n) for number of samples by number of features, and targets are (m,c) for number of samples by number of labels. I can't figure out how to use this for recurrent data, where the Keras models like LSTM assume your input is 3d: (m, t, c) for number of samples by number of timesteps by number of channels. Is there a way in the existing framework that this kind of data can be accommodated?

mikkokotila commented 5 years ago

Can you share the Keras model you want to use and a little bit of the data.

PiaCuk commented 5 years ago

I am also using talos with LSTM. Following the suggestions from #11 I try to input dummy data to Scan() and then use a data generator with fit_generator in the model function which accesses the generators directly from a common scope. This is my code:

_BATCH_SIZE = 4
_SET_SIZE = 20
_CHANNELS, _IMG_WIDTH, _IMG_HEIGHT = (3, 192, 192)
train_generator = create_generator("/path/to/data",
                                   batch_size=_BATCH_SIZE,
                                   set_size=_SET_SIZE,
                                   img_dim=(_CHANNELS, _IMG_WIDTH, _IMG_HEIGHT))  # Channels first!
val_generator = create_generator("/path/to/data",
                                 batch_size=_BATCH_SIZE,
                                 set_size=_SET_SIZE,
                                 img_dim=(_CHANNELS, _IMG_WIDTH, _IMG_HEIGHT))  # Channels first!

def CNN_LSTM(x, y, x_val, y_val, params):
    [...]
    optimizer = optimizers.adam(lr_normalizer(params['lr'], optimizers.adam))
    model.compile(loss=params['losses'], optimizer=optimizer, metrics=['mae'])

    #print(model.summary())

    out = model.fit_generator(generator=train_generator, validation_data=val_generator,
                              epochs=params['epochs'], callbacks=[live()])

    return out, model

dummy_x = np.empty((1, _SET_SIZE, _CHANNELS, _IMG_WIDTH, _IMG_HEIGHT))
dummy_y = np.empty((1, _SET_SIZE))
scan = ta.Scan(x=dummy_x,
               y=dummy_y,
               model=CNN_LSTM,
               params=params,
               grid_downsample=0.1,
               dataset_name='img_data',
               experiment_no='1')

ta.Deploy(scan, 'experiment_lstm_1')

UPDATE the folder created with Deploy() is empty due to a KeyError. Why can val_acc not be computed?

Traceback (most recent call last):
  File "lstm_talos.py", line 107, in <module>
    ta.Deploy(scan, 'experiment_lstm_1')
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/talos/commands/deploy.py", line 28, in __init__
    self.best_model = best_model(scan_object, metric, asc)
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/talos/utils/best_model.py", line 11, in best_model
    best = self.data.sort_values(metric, ascending=asc).iloc[0].name
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/pandas/core/frame.py", line 4719, in sort_values
    k = self._get_label_or_level_values(by, axis=axis)
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/pandas/core/generic.py", line 1706, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'val_acc'

mikkokotila commented 5 years ago

When you deploy, you have to do talos.Deploy( ... metric='mae' ... ) because you don't have 'val_acc' in your model.

mikkokotila commented 5 years ago

Closing this and creating a new issue for removing the default value 'val_acc' so user has to choose.

PiaCuk commented 5 years ago

I have an update for my code: metric='mae' gives you the same error as val_acc, you have to spell out 'val_mean_absolute_error'. Deploy then saves details.txt, model.h5 and model.json, but then throws an error. Here is my Deploy() command and the error:

ta.Deploy(scan, 'experiment_lstm_1', metric='val_mean_absolute_error', asc=True)

Deploy package talos_one_subject_test have been saved.
Traceback (most recent call last):
  File "lstm_talos.py", line 110, in <module>
    ta.Deploy(scan, 'experiment_lstm_1', metric='val_mean_absolute_error', asc=True)
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/talos/commands/deploy.py", line 34, in __init__
    self.save_data()
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/talos/commands/deploy.py", line 60, in save_data
    x = pd.DataFrame(self.scan_object.x[:100])
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/pandas/core/frame.py", line 424, in __init__
    copy=copy)
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 146, in init_ndarray
    values = prep_ndarray(values, copy=copy)
  File "/home/pia/anaconda3/envs/venv_cuda/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 249, in prep_ndarray
    raise ValueError('Must pass 2-d input')
ValueError: Must pass 2-d input

Is this error related to passing dummy data instead of the dataset?

mikkokotila commented 5 years ago

Sorry I missed this as the ticket had been closed. There was an issue with the Deploy codes that were causing this in all cases where x was not 2d. This is now fixed in v.0.6 onwards. You can access it with:

pip install git+http://github.com/autonomio/talos@daily-dev

autonomio / talos

Support for recurrent neural network training data #282