georgesterpu / avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models
GNU General Public License v3.0
81 stars 28 forks source link

Inquiry about some parameter selection reason #10

Closed LeeYongHyeok closed 5 years ago

LeeYongHyeok commented 5 years ago

Hi, georgesterpu.

Thanks for sharing this great code ahead of the question.

I have some questions some initial parameter settings.

In expriment_tcd_av.py, why do you choose (0.9, 0.9, 0.9) for dropout probability?

Second, why do you initialize 'highway_encoder' parameter to 'False'?

Third, If i change architecture from 'av_align' to 'wlas', can i run the WLAS model?

Finally, could you sharing your 'num_epochs' and 'learning_rate' on the LRS2 DB?

LeeYongHyeok commented 5 years ago

Oh, is it correct the architecture of 'bimodal' is WLAS?

georgesterpu commented 5 years ago

Hi @LeeYongHyeok

The dropout rate is not a parameter that we tuned in particular. 10% seems to be a common choice.

highway_encoder allows to wrap your cells with a rnn.HighwayWrapper. This is a feature that I tested some time ago and gave some promising results, yet I did not find time to investigate it in greater detail. Often, it is not the raw performance that I am interested in.

You are right, the bimodal architecture is the one that uses two attention mechanisms on the decoder side and concatenates the two resulting context vectors. It was the first bimodal architecture implemented in Sigmedia-AVSR, but I'll go with your suggestion and rename it soon for more clarity.

On LRS2 I typically train the system for 100 epochs at a constant learning rate of 0.001, although you could do much better with an advanced learning rate schedule.

LeeYongHyeok commented 5 years ago

Thanks for your reply, @georgesterpu

Do you plan to experiment with the LRS3 DB or publish the code?

georgesterpu commented 5 years ago

We are still analysing the use of LRS3 in our experiments.

Generally, I will ensure that our work is fully reproducible and accessible once published.

There are already a few enhancements to this AVSR project, currently under a private repository while our work is under review.

LeeYongHyeok commented 5 years ago

Thanks for your reply @georgesterpu

I fully agree with your opinion.

So i close this issues. Thanks!