Open jamesoliver1981 opened 1 year ago
You need to rebuild graph. See https://alphacephei.com/vosk/lm
And it helps if you provide audio files
The audio files can be found in this link
Those with a postfix love are the "love" examples, and those with "all" similar.
There are many elements I don't understand to the graph element so will come back to that in a second.
This is a json (ish) question: I am trying to read the breakdown of results to get the timing and probability of the word. I remove "text" here and get the full result. If I repalce this with "result" I get an error "string indices must be integers"
PS I absolutely love this tool and fully appreciate your help in helping me use it correctly
results = []
while True:
data = wf.readframes(4000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
print(rec.Result())
results.append(json.loads(rec.FinalResult())['text'])
Re rebuilding the graph, which element in the link you shared are you suggesting I work with - there is no element that specifically says rebuild the graph. Sorry, if this is a dump question but I don't see it
Hi, weird title I know. I'm trying to use VOSK on some tennis recordings where scores like "fifteen love" comes up. Sadly the model I am using is not great at picking up the "love" element, whether before or after. I have read that there are options to enhance word identification however I don't know if this will work ( and whilst there are some docs on how to adjust this, it looked a little beyond my capability, so I am posting this question first to get feedback).
The reason why I think this will NOT work is because I have built 2 VOSK models and simply changed the vocab. In the second, "love" is almost the only word in the custom dictionary, and there I can see that where this is picked up (timestamp) is in the middle of the prior word (ie fifteen).
Below my screen shots: Full grammer model output - fifteen is picked up between 23,82 & 24.208![image](https://user-images.githubusercontent.com/13690904/198883922-0ed4c4bb-613e-4068-a1b0-0c876e16320d.png)
Love Grammar model output - love is picked up at 24.15 (ie in the middle of the above)![image](https://user-images.githubusercontent.com/13690904/198883956-5b20833a-df9a-4bdf-8ce5-c243d5f53fa9.png)
My planned approach is to run the model twice, each time outputting the word and the elements of result into a table to be able to construct the phrase. The only challenge here is that it double the run time.
My question is whether the enhancements of the language / specific grammar / increased probabilities will help resolve this issue. I have the same issue with "fifteen all" and there my solution doesn't work as "all" or a soundalike doesn't get picked up by a separate model.
I can provide example sound clips if that helps you help me.
My code: