Closed ghost closed 6 years ago
One solution:
from pocketsphinx import AudioFile
# Frames per Second
fps = 100
for phrase in AudioFile(frate=fps): # frate (default=100)
print('-' * 28)
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
print('-' * 28)
for s in phrase.segments(detailed=True):
print('| %4ss | %4ss | %8s |' % (s[2] / fps, s[3] / fps, s[0]))
print('-' * 28)
# ----------------------------
# | start | end | word |
# ----------------------------
# | 0.0s | 0.24s | <s> |
# | 0.25s | 0.45s | <sil> |
# | 0.46s | 0.63s | go |
# | 0.64s | 1.16s | forward |
# | 1.17s | 1.52s | ten |
# | 1.53s | 2.11s | meters |
# | 2.12s | 2.6s | </s> |
# ----------------------------
Another solution:
from pocketsphinx import AudioFile
# Frames per Second
fps = 100
for phrase in AudioFile(frate=fps): # frate (default=100)
print('-' * 28)
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
print('-' * 28)
for s in phrase.seg():
print('| %4ss | %4ss | %8s |' % (s.start_frame / fps, s.end_frame / fps, s.word))
print('-' * 28)
# ----------------------------
# | start | end | word |
# ----------------------------
# | 0.0s | 0.24s | <s> |
# | 0.25s | 0.45s | <sil> |
# | 0.46s | 0.63s | go |
# | 0.64s | 1.16s | forward |
# | 1.17s | 1.52s | ten |
# | 1.53s | 2.11s | meters |
# | 2.12s | 2.6s | </s> |
# ----------------------------
And the last solution:
from pocketsphinx import Pocketsphinx
ps = Pocketsphinx() # frate (default=100)
ps.decode()
print('-' * 28)
print('| %5s | %3s | %4s |' % ('start', 'end', 'word'))
print('-' * 28)
for s in ps.seg():
print('| %4ss | %4ss | %8s |' % (s.start_frame / 100, s.end_frame / 100, s.word))
print('-' * 28)
# ----------------------------
# | start | end | word |
# ----------------------------
# | 0.0s | 0.24s | <s> |
# | 0.25s | 0.45s | <sil> |
# | 0.46s | 0.63s | go |
# | 0.64s | 1.16s | forward |
# | 1.17s | 1.52s | ten |
# | 1.53s | 2.11s | meters |
# | 2.12s | 2.6s | </s> |
# ----------------------------
and(2) are words with alternative transcriptions, you can see them in the dictionary.
Hi, I have tired to get the time coordinates but they are not correct. I tried all the codes above, but seems the given times are bigger than the duration of the audio file. Is there any other factor, attribute, can must be included? what is Decoder.n_frames()?
Thank you
I'm in need of getting the time coordinates of every word sphinx thinks it identified.
I found out about segments() and realized that it's exactly what I need:
However it is not clear to me what the frame # represents. It was immediately apparent that it is neither seconds nor sample #s.
I require a method to convert these frame #s into seconds.
Edit: Also, some words have a (#) following them, eg: 'and(2)', What does it represent?