Open ghost opened 3 years ago
For reference, the summary is:
For just the encoder piece, contains:
Model: "sequential"
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 18, 17, 32) 3232
permute (Permute) (None, 17, 18, 32) 0
reshape (Reshape) (None, 17, 576) 0
bidirectional (Bidirectional (None, 17, 64) 117120
bidirectional_1 (Bidirection (None, 64) 18816
Total params: 139,168
Trainable params: 139,168
Non-trainable params: 0
**Detect piece:**
- Single fully-connected layer, 64 hidden units.
- Sigmoidal layer for output.
Model: "sequential_1"
dense (Dense) (None, 64) 4160
dropout (Dropout) (None, 64) 0
Total params: 4,225 Trainable params: 4,225 Non-trainable params: 0
**All together:**
```________________________________________________________________
Model: "arik_crnn"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential (Sequential) (None, 64) 139168
_________________________________________________________________
sequential_1 (Sequential) (None, 1) 4225
=================================================================
Total params: 143,393
Trainable params: 143,393
Non-trainable params: 0
I ended up shooting an email to the author in hopes he could provide more details on the architecture than are in the paper. Will post here if I hear back!
I'm still having some trouble reconciling the discrepancy between the Arik et al. CRNN paper's claimed number of parameters and the number of parameters in my implementation. I did realize I made an error in specifying which dimension is time vs. frequency when defining the stride/kernel size, and after correcting this, the model went up to ~143k parameters. But, the paper says their model, with the same configuration, has ~229k params.
I've tried everything I can think of to identify the error, but there's nothing in the paper that I've found that indicates the model I've built is missing anything. If any of you have a moment to take a look, another pair of eyes looking at the model would be very helpful.