MerlinPCarson / WakeWord-Detection

Training and evaluation scripts for wake word detection DNN models.
Apache License 2.0
8 stars 1 forks source link

Parameters for CRNN #11

Open ghost opened 3 years ago

ghost commented 3 years ago

I'm still having some trouble reconciling the discrepancy between the Arik et al. CRNN paper's claimed number of parameters and the number of parameters in my implementation. I did realize I made an error in specifying which dimension is time vs. frequency when defining the stride/kernel size, and after correcting this, the model went up to ~143k parameters. But, the paper says their model, with the same configuration, has ~229k params.

I've tried everything I can think of to identify the error, but there's nothing in the paper that I've found that indicates the model I've built is missing anything. If any of you have a moment to take a look, another pair of eyes looking at the model would be very helpful.

ghost commented 3 years ago

For reference, the summary is:

For just the encoder piece, contains:

Model: "sequential" Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 18, 17, 32) 3232


permute (Permute) (None, 17, 18, 32) 0


reshape (Reshape) (None, 17, 576) 0


bidirectional (Bidirectional (None, 17, 64) 117120


bidirectional_1 (Bidirection (None, 64) 18816
Total params: 139,168 Trainable params: 139,168 Non-trainable params: 0


**Detect piece:**
- Single fully-connected layer, 64 hidden units.
- Sigmoidal layer for output.

Model: "sequential_1"


Layer (type) Output Shape Param #

dense (Dense) (None, 64) 4160


dropout (Dropout) (None, 64) 0


dense_1 (Dense) (None, 1) 65

Total params: 4,225 Trainable params: 4,225 Non-trainable params: 0


**All together:**
```________________________________________________________________
Model: "arik_crnn"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sequential (Sequential)      (None, 64)                139168    
_________________________________________________________________
sequential_1 (Sequential)    (None, 1)                 4225      
=================================================================
Total params: 143,393
Trainable params: 143,393
Non-trainable params: 0
ghost commented 3 years ago

I ended up shooting an email to the author in hopes he could provide more details on the architecture than are in the paper. Will post here if I hear back!