flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

Stream convnet fork pretrained model was failed with "Unkown index in dictionary error" #926

Open phamvandan opened 3 years ago

phamvandan commented 3 years ago

When i change the number of tokens classes from 854 to 470 it raised this error when fine tune with FORK Screenshot from 2020-12-26 11-44-41

and here is my config file

image

tlikhomanenko commented 3 years ago

Hi @phamvandan

Could you give details how you changes the tokens set and how the lexicon is created?

Also additional note: When you fork - you use the same network, which means the last layer will be the same as before which maps embedding into number of tokens. This menas that during fork you need to recreate last layer to have the necessary number of tokens.

phamvandan commented 3 years ago

Hi @tlikhomanenko "This menas that during fork you need to recreate last layer to have the necessary number of tokens", How can i do this?

phamvandan commented 3 years ago

Hi @tlikhomanenko, I created lexicon and token like this forms: image image And when i decrease the number of tokens from 870 to 470 i had errors above.

tlikhomanenko commented 3 years ago

One of the solutions https://github.com/facebookresearch/wav2letter/issues/829. Let me know if it is not clear. Here is also recent snapshot on doing similar thing https://github.com/facebookresearch/flashlight/blob/master/flashlight/app/asr/tutorial/FinetuneCTC.cpp#L254-L270 but you need to exclude last index from the loop on setting params.

phamvandan commented 3 years ago

hi @tlikhomanenko , It means that we need to customize and rebuild?

tlikhomanenko commented 3 years ago

Yes. But this should be simple. Let me know if you need help in rebuild.

phamvandan commented 3 years ago

Because I was not familiar with C++ language enough, so can you clear the method to rebuild for me? Thanks

tlikhomanenko commented 3 years ago

You need to follow installation either with docker image https://github.com/facebookresearch/flashlight/blob/master/.docker/Dockerfile-CUDA or from source https://github.com/facebookresearch/flashlight#building-from-source (you modify Train.cpp and then rerun make command).