Closed jin-archipin closed 4 years ago
Hi!
I'm really glad that the repo is able to help! :)
To be perfectly honest, I've never attempted to retrieve phonemes from a keyword search. However, it does appear that it's possible.
CMUSphinx has a tutorial on it here: https://cmusphinx.github.io/wiki/phonemerecognition/
It also appears that Microsoft offers a library for phonetic matching. https://github.com/microsoft/PhoneticMatching
I'd start with the CMUSphinx link and see if that helps. If you have any questions, let me know! When I get some time, I'll need to add this to the list of examples in the project.
I succeeded with -allphone option already, but you know, the result is not great compared to keyword spotting. When I use -keyphrase, the timestamp only shows for the keywords.
For example, if you say "apple" in -allphone option, it's like below. | phoneme | start | end |
---|---|---|---|
AE | 0.01 | 0.13 | |
P | 0.14 | 0.20 | |
AH | 0.21 | 0.33 | |
L | 0.34 | 0.50 |
but, when I try -keyphrase, it only show whole word timestamp like below. | keyword | start | end |
---|---|---|---|
apple | 0.01 | 0.50 |
maybe I should look at the pocket sphinx more.
Glad to hear you got it somewhat working!
What exactly are you looking for when it comes to phoneme spotting? I guess I just don't understand why you need phoneme recognition. Maybe if I knew the use-case, I could help more.
It looks like the output you're receiving is accurate, right? When you spot for keywords it looks like it's giving you all the phonemes. (Ex: AE P AH L) But they're split by a space.
I stated it wrong, so I changed the table. I forgot that I wrote some code for apple -> AE P AH L when I do KWS. I'm doing research for lip-sync animation and that's why I need phonemes and timestamps. haha It works great with -allphone option, but I want to improve the result because I already have the transcript for the recorded voice. I just found out maybe I can use both options together, so I'm trying it now.
It seems working, but I may need to find better values for options like "-lw". Thank you for letting me RTFM again. lol
Hi again! This project is helping me A LOT and thank you about that. By the way, can I get phonemes with timestamp from the result of key word search?