Hi, I have tried to export both the VGG-M and the Resnet-50 models for verification to Keras. In the first case, everything worked well, and the architecture proposed in the paper was the same as the architecture that I obtained from the matlab file contanining the model. However, in the second case I have found the following discrepancies:
The embedding dimension proposed in the paper is 512. In the matlab model the dimension is 128 (why is this the case?)
In the VoxCeleb2 paper (and in the Resnet original paper: https://arxiv.org/pdf/1512.03385.pdf) an activation is applied after the addition of the nonlinear stack's output and the shortcut connection. However, in the matlab model the reLU activation is applied both before and after the shortcut addition. The Resnet original paper is explicit about applying the activation just after the shortcut addition, so I don't understand the reason behind it.
Just before the last block is applied (fc_1, pool_time, fc_2, following VoxCeleb2 paper notation), the matlab model adds a pooling layer ( pool_final_b1 and pool_final_b2 for each network of the siamese architecture). I couldn't find any mention of this layer in the original paper.
Except from the first convolutional layer (conv0_b1, conv0_b2 following the matlab model notation) and the feed forward layers (fc65_b1, fc65_b2, fc8_s1, fc8_s2, following the matlab model notation), every intermediate conv layer has no bias parameters. Is there any reason for this?
Hi, I have tried to export both the VGG-M and the Resnet-50 models for verification to Keras. In the first case, everything worked well, and the architecture proposed in the paper was the same as the architecture that I obtained from the matlab file contanining the model. However, in the second case I have found the following discrepancies:
The embedding dimension proposed in the paper is 512. In the matlab model the dimension is 128 (why is this the case?)
In the VoxCeleb2 paper (and in the Resnet original paper: https://arxiv.org/pdf/1512.03385.pdf) an activation is applied after the addition of the nonlinear stack's output and the shortcut connection. However, in the matlab model the reLU activation is applied both before and after the shortcut addition. The Resnet original paper is explicit about applying the activation just after the shortcut addition, so I don't understand the reason behind it.
Just before the last block is applied (fc_1, pool_time, fc_2, following VoxCeleb2 paper notation), the matlab model adds a pooling layer ( pool_final_b1 and pool_final_b2 for each network of the siamese architecture). I couldn't find any mention of this layer in the original paper.
Except from the first convolutional layer (conv0_b1, conv0_b2 following the matlab model notation) and the feed forward layers (fc65_b1, fc65_b2, fc8_s1, fc8_s2, following the matlab model notation), every intermediate conv layer has no bias parameters. Is there any reason for this?