Closed LeeYongHyeok closed 5 years ago
Oh, is it correct the architecture of 'bimodal' is WLAS?
Hi @LeeYongHyeok
The dropout rate is not a parameter that we tuned in particular. 10% seems to be a common choice.
highway_encoder
allows to wrap your cells with a rnn.HighwayWrapper
. This is a feature that I tested some time ago and gave some promising results, yet I did not find time to investigate it in greater detail. Often, it is not the raw performance that I am interested in.
You are right, the bimodal
architecture is the one that uses two attention mechanisms on the decoder side and concatenates the two resulting context vectors. It was the first bimodal architecture implemented in Sigmedia-AVSR, but I'll go with your suggestion and rename it soon for more clarity.
On LRS2 I typically train the system for 100 epochs at a constant learning rate of 0.001, although you could do much better with an advanced learning rate schedule.
Thanks for your reply, @georgesterpu
Do you plan to experiment with the LRS3 DB or publish the code?
We are still analysing the use of LRS3 in our experiments.
Generally, I will ensure that our work is fully reproducible and accessible once published.
There are already a few enhancements to this AVSR project, currently under a private repository while our work is under review.
Thanks for your reply @georgesterpu
I fully agree with your opinion.
So i close this issues. Thanks!
Hi, georgesterpu.
Thanks for sharing this great code ahead of the question.
I have some questions some initial parameter settings.
In expriment_tcd_av.py, why do you choose (0.9, 0.9, 0.9) for dropout probability?
Second, why do you initialize 'highway_encoder' parameter to 'False'?
Third, If i change architecture from 'av_align' to 'wlas', can i run the WLAS model?
Finally, could you sharing your 'num_epochs' and 'learning_rate' on the LRS2 DB?