TensorFlow code to perform end-to-end Optical Music Recognition on monophonic scores through Convolutional Recurrent Neural Networks and CTC-based training.
Hi, great work on the code. I am trying to understand every step of your code and have a question regarding the features-to-sequence step in ctc_model:
As I understand this, feature_dim would have a value of 256 * 8 (with normal model params). My question is how is this then fed into an RNN of 512 units, considering the feature_dim is 2048?
It is connected in a dense manner: there will be 2048 times 512 connections. In other words, 2048 features are the input to the RNN layer, which has an output dimension of 512 activations.
Hi, great work on the code. I am trying to understand every step of your code and have a question regarding the features-to-sequence step in ctc_model:
As I understand this, feature_dim would have a value of
256 * 8
(with normal model params). My question is how is this then fed into an RNN of 512 units, considering the feature_dim is 2048?