Funnyguy77 / PocketSphinxUnityDemo

A sample Unity project showing how to use PocketSphinx.
MIT License
36 stars 12 forks source link

Extracting phonemes #6

Closed jin-archipin closed 4 years ago

jin-archipin commented 4 years ago

Hi again! This project is helping me A LOT and thank you about that. By the way, can I get phonemes with timestamp from the result of key word search?

Funnyguy77 commented 4 years ago

Hi!

I'm really glad that the repo is able to help! :)

To be perfectly honest, I've never attempted to retrieve phonemes from a keyword search. However, it does appear that it's possible.

CMUSphinx has a tutorial on it here: https://cmusphinx.github.io/wiki/phonemerecognition/

It also appears that Microsoft offers a library for phonetic matching. https://github.com/microsoft/PhoneticMatching

I'd start with the CMUSphinx link and see if that helps. If you have any questions, let me know! When I get some time, I'll need to add this to the list of examples in the project.

jin-archipin commented 4 years ago

I succeeded with -allphone option already, but you know, the result is not great compared to keyword spotting. When I use -keyphrase, the timestamp only shows for the keywords.

For example, if you say "apple" in -allphone option, it's like below. phoneme start end
AE 0.01 0.13
P 0.14 0.20
AH 0.21 0.33
L 0.34 0.50
but, when I try -keyphrase, it only show whole word timestamp like below. keyword start end
apple 0.01 0.50

maybe I should look at the pocket sphinx more.

Funnyguy77 commented 4 years ago

Glad to hear you got it somewhat working!

What exactly are you looking for when it comes to phoneme spotting? I guess I just don't understand why you need phoneme recognition. Maybe if I knew the use-case, I could help more.

It looks like the output you're receiving is accurate, right? When you spot for keywords it looks like it's giving you all the phonemes. (Ex: AE P AH L) But they're split by a space.

jin-archipin commented 4 years ago

I stated it wrong, so I changed the table. I forgot that I wrote some code for apple -> AE P AH L when I do KWS. I'm doing research for lip-sync animation and that's why I need phonemes and timestamps. haha It works great with -allphone option, but I want to improve the result because I already have the transcript for the recorded voice. I just found out maybe I can use both options together, so I'm trying it now.

jin-archipin commented 4 years ago

It seems working, but I may need to find better values for options like "-lw". Thank you for letting me RTFM again. lol