OMR-Research / tf-end-to-end

TensorFlow code to perform end-to-end Optical Music Recognition on monophonic scores through Convolutional Recurrent Neural Networks and CTC-based training.
MIT License
141 stars 59 forks source link

Features to Sequence step #7

Closed jonwoodburn closed 5 years ago

jonwoodburn commented 5 years ago

Hi, great work on the code. I am trying to understand every step of your code and have a question regarding the features-to-sequence step in ctc_model:

features = tf.transpose(x, perm=[2, 0, 3, 1])
feature_dim = params['conv_filter_n'][-1] * (params['img_height'] / height_reduction)
feature_width = input_shape[2] / width_reduction
features = tf.reshape(features, tf.stack([tf.cast(feature_width,'int32'), input_shape[0], tf.cast(feature_dim,'int32')]))

As I understand this, feature_dim would have a value of 256 * 8 (with normal model params). My question is how is this then fed into an RNN of 512 units, considering the feature_dim is 2048?

calvozaragoza commented 5 years ago

Hi,

It is connected in a dense manner: there will be 2048 times 512 connections. In other words, 2048 features are the input to the RNN layer, which has an output dimension of 512 activations.