TUT-ARG / DCASE2017-baseline-system

DCASE 2017 Baseline system
82 stars 54 forks source link

Why are you using 'input_dim' in KerasMixin.create_model()? #3

Closed thisisjl closed 7 years ago

thisisjl commented 7 years ago

Hi, I am following your format of defining the neural network architecture in the parameters file and letting KerasMixin_create_model() building it, because I think it is clever. In the create_model function, the variable name to set up the dimension of the input data is input_dim.

My network uses Keras.layers.Conv1d, which, when using input_dim, creates a wrong number of parameters. When I use instead the parameter input_shape, the network is okay.

I understand the fully connected network that you released as baseline is set up using input_dim, but I have checked that it can also be set up with input_shape (if the value is in a tuple). Therefore, I would like to know if there is a reason to use input_dim instead of input_shape

I copied below the summary of networks built with input_dim and input_shape.

Fully connected networks

The first one is using input_dim as by default in your code. The second case is using input_shape, but note that the value of this argument is the tuple (4400,). And the third is also using input_shape with the tuple (4400,1) and the resulting number of arguments is wrong.

layer_setup['config'] = {'activation': 'relu', 'input_dim': 4400, 'kernel_initializer': 'uniform', 'units': 50}
self.model = Sequential()
self.model.add(LayerClass(**dict(layer_setup.get('config'))))
self.model.summary()

layer_setup['config'] = {'activation': 'relu', 'input_shape': (4400,), 'kernel_initializer': 'uniform', 'units': 50}
self.model = Sequential()
self.model.add(LayerClass(**dict(layer_setup.get('config'))))
self.model.summary()

layer_setup['config'] = {'activation': 'relu', 'input_shape': (4400,1), 'kernel_initializer': 'uniform', 'units': 50}
self.model = Sequential()
self.model.add(LayerClass(**dict(layer_setup.get('config'))))
self.model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_15 (Dense)             (None, 50)                220050    
=================================================================
Total params: 220,050.0
Trainable params: 220,050
Non-trainable params: 0.0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_16 (Dense)             (None, 50)                220050    
=================================================================
Total params: 220,050.0
Trainable params: 220,050
Non-trainable params: 0.0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_17 (Dense)             (None, 4400, 50)          100       
=================================================================
Total params: 100.0
Trainable params: 100
Non-trainable params: 0.0
_________________________________________________________________

Convolutional networks

In the first case, I am using input_dim as it would be by default in the code. Note the output dimension of the network and the number of parameters. And note also that in the Keras 2 API input_dim has been deprecated.

In the second case, commented out here, I use input_shape = (4400,), but there is an error as Conv1d expects 3 dimensions and it is not possible to add the layer.

In the third case, I use input_shape=(4400,1) and the resulting network is fine.

layer_setup['config'] = {'filters': 32, 'kernel_size': 64, 'input_dim': 4400}
self.model = Sequential()
self.model.add(LayerClass(**dict(layer_setup.get('config'))))
self.model.summary()

#layer_setup['config'] = {'filters': 32, 'kernel_size': 64, 'input_shape': (4400,)}
#self.model = Sequential()
#self.model.add(LayerClass(**dict(layer_setup.get('config'))))
#self.model.summary()

layer_setup['config'] = {'filters': 32, 'kernel_size': 64, 'input_shape': (4400, 1)}
self.model = Sequential()
self.model.add(LayerClass(**dict(layer_setup.get('config'))))
self.model.summary()

/Users/JL/Documents/SMC10/Master-Thesis/Reference-code/DCASE2017-modified/dcase_framework/learners.py:3: UserWarning: Update your `Conv1D` call to the Keras 2 API: `Conv1D(input_shape=(None, 440..., kernel_size=64, filters=32)`
  """
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_5 (Conv1D)            (None, None, 32)          9011232   
=================================================================
Total params: 9,011,232.0
Trainable params: 9,011,232
Non-trainable params: 0.0
_________________________________________________________________
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d_6 (Conv1D)            (None, 4337, 32)          2080      
=================================================================
Total params: 2,080.0
Trainable params: 2,080
Non-trainable params: 0.0
_________________________________________________________________

Do you plan to keep input_dim and should I find a solution for my case or do you want to apply some changes to it? Thank you so much

toni-heittola commented 7 years ago

The system was designed originally mainly for MLP and therefore input_dim was used. I now updated the create_model method to use input_shape instead to allow more generic usage, see commit 9ec18d085d8baeec6c431169d7802ff47dbc16cf.

If you plan to extend the system for CNN, you need to make custom learner class which is inherited from SceneClassifierMLP or EventDetectorMLP class, and override some methods for your purpose. At the same time you can also override create_model method.

I'm happy to make DCASE Framework more generic to accommodate extensions to state-of-the-art approaches, as long as they do not break the baseline system. Just make pull request once you stumble on too restrictive functionality.

thisisjl commented 7 years ago

Thanks for your time in making the changes.

In my particular case, I would like to use the audio waveform as the input to the neural network. I did some changes, but I am not sure they respect the framework. As I am skipping the feature extraction step and the loading of the pickle files, in application_core.py. Then, I load batch by bacth the audio files directly to a model.fit_generator().

I did not make any pull request because I am sure this workflow could be better implemented to be integrated to your existing framework. But if you have any suggestion in the best way to approach this, I appreciate it.