This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
Hi there ! Thanks for implementing such a great SoTA end-to-end ASR toolkit !
Really appreciate the complicated joint decoding algorithm part.
I'm a little bit confused about the implementation of ctc prefix score decoding.
In ctc.py, line 51, I'm not sure whether the last char dim of prev_blank[last_char] should be assigned logzero or not.
Would it make a little bit more sense if prev_nonblank[last_char] be assigned to logzero ?
The phd thesis of Alex Graves mentioned that lin line 17, the nonblank part of newLabelProb is (log)zero if p* ends in k, which might corresponds to the last_char dim of prev_nonblank ?
@Chung-I you're right... this is a mistake I made, this also addressed the bug of weird <eos> probability given at the end of decoding.
The bug was fixed in #17
Thanks a lot!
https://github.com/Alexander-H-Liu/End-to-end-ASR-Pytorch/blob/77b657b7004cabfd56076a818cecc0ce855f6b0a/src/ctc.py#L50-L60
Hi there ! Thanks for implementing such a great SoTA end-to-end ASR toolkit ! Really appreciate the complicated joint decoding algorithm part. I'm a little bit confused about the implementation of ctc prefix score decoding. In ctc.py, line 51, I'm not sure whether the last char dim of prev_blank[last_char] should be assigned logzero or not. Would it make a little bit more sense if prev_nonblank[last_char] be assigned to logzero ? The phd thesis of Alex Graves mentioned that lin line 17, the nonblank part of newLabelProb is (log)zero if p* ends in k, which might corresponds to the last_char dim of prev_nonblank ?
Thanks again for all the hard work !