TensorSpeech / TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
https://huylenguyen.com/asr
Apache License 2.0
929 stars 243 forks source link

token level timestep #78

Open Mddct opened 3 years ago

Mddct commented 3 years ago

Is it possible to output token level timestep?

eg: hello 100-600 world 712-900 .......

nglehuy commented 3 years ago

@Mddct I don't quite understand your question and example. Currently we tokenize string (aka label) into list of characters or list of subwords.

Mddct commented 3 years ago

Output label and its start time and end time in orign wav。

nglehuy commented 3 years ago

@Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?

Mddct commented 3 years ago

I am trying to do this feature now on transducer 。But I could not find any paper and article related。

Nguyễn Lê Huy notifications@github.com 于 2020年12月15日周二 下午10:48写道:

@Mddct https://github.com/Mddct Oh, currently we don't support that feature yet. But I'll look into it. Anyway, if you have any idea of doing that, especially for rnn transducer, can you update here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TensorSpeech/TensorFlowASR/issues/78#issuecomment-745340810, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABFN3QYUJITAQYCUHIQ4Y53SU5ZLVANCNFSM4U22QBMA .

nglehuy commented 3 years ago

There's this but for CTC, maybe we can apply it with some modification.

Mddct commented 3 years ago

@usimarit It seems two steps: 1 calculate the mean_start_shift, mean_end_shift 2 apply shift on each start and end

I will evaluate the accuracy later. But for other language or own corpus, we need do force aligment to get time information。