georgesterpu / avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models
GNU General Public License v3.0
81 stars 28 forks source link

What should I do to reproduce the results of the paper? #18

Closed gyl1993 closed 4 years ago

gyl1993 commented 5 years ago

I have trained 400 epochs using learning rate 0.001 and 100 epochs using learning rate 0.0001 with clean tcd-timit data of speaker-dependent split by experiment_tcd_av.py, but i can't get the result which is shown in the paper"Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition". My experiment result: 30.70%(cer)/65.04%(wer) Result in paper: 17.70%(cer)/41.90%(wer) What should I do to reproduce the results of the paper?

georgesterpu commented 5 years ago

Hi @gyl1993, thanks for opening the issue. Could you run the same experiment from multiple random initialisations ?

gyl1993 commented 5 years ago

Hi @georgesterpu The result is almost the same with different initialization. Could you tell me how to set the parameters? thank you!

georgesterpu commented 5 years ago

Please have a look at this issue: https://github.com/georgesterpu/Sigmedia-AVSR/issues/16 The default parameters in the example scripts are different from the ones in the article.

Internally, I have cleaned the codebase and simplified the scripts to reproduce our results. I am hoping to update this public repository as soon as possible.

gyl1993 commented 5 years ago

OK, thank you!

georgesterpu commented 4 years ago

Hi @gyl1993 I have just updated this repository to reproduce the experiments reported in our latest article. It should follow the same learning rate schedule and hyper-parameters.

Please feel free to open a new issue in case something is not working.