freewym / espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Other
941 stars 116 forks source link

Relative Positional Embedding (fixed or learnable) #68

Closed freewym closed 3 years ago

sw005320 commented 3 years ago

@freewym, did you observe some improvements? I’m curious.

freewym commented 3 years ago

@sw005320 I am still running the experiments on non-streaming models. The main purpose for this is for streaming models though.

May I ask whether you found relative embeddings helpful in non-streaming models?

sw005320 commented 3 years ago

The conclusion is mixed, but there are some solid benefits for the long recording.

freewym commented 3 years ago

@sw005320 FYI, my experiments on Librispeech show that: for encoders using relative embeddings is better than absolute ones, and relative sinusoidal is better than relative learned; for decoders, however, absolute sinusoidal is the best, and the relative ones lead to much worse WERs.