Alexander-H-Liu / End-to-end-ASR-Pytorch

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.
MIT License
1.18k stars 317 forks source link

Single file Inference #54

Open shamil-kadavan opened 4 years ago

shamil-kadavan commented 4 years ago

Hi team, I want to do a single audio file inference. Can anyone pls help me with this?

qute012 commented 4 years ago

Hello, @shamil-kadavan. I think when you use bucketing, it implements squeeze(inputs, dim=0). e.g. input shape => (1, 440, 40) implement squeeze shape => (440, 40)

So RNN can't forward that input shape

Below, example printing gold and pred converted characters.

try:
        for line in (torch.max(batch_label,dim=-1)[1]).numpy():
            tmp = ''
            #print(line)
            for idx in line:
                if idx == idx2char.index('<sos>'): continue
                if idx == idx2char.index('<eos>'): break
                tmp += idx2char[idx]
            gt.append(tmp)
    except:
        tmp = ''
        for idx in (torch.max(batch_label,dim=-1)[1]).numpy():
            if idx == idx2char.index('<sos>'): continue
            if idx == idx2char.index('<eos>'): break
            tmp += idx2char[idx]
        gt.append(tmp)
shamil-kadavan commented 4 years ago

@qute012 Thanks for the answer I am still confused how to do single file inference. My knowledge in speech recognition is very minimal.