SysDevProj-18 / BrailleAssistant

Systems Design Project 2024 - Most Societal Impact Award
MIT License
1 stars 0 forks source link

Game Feature #62

Open nkdem opened 7 months ago

nkdem commented 7 months ago

A floating idea that we will now be proceeding with.

The way I envision it at the moment (at least for some sort of MVP):

  1. Store each word that is inserted into the braille display, into a word bank (a database).
  2. A button/key starts the game mode
  3. An API that can be called from main.py `get_random_word(... parameters [more on this later]) (this would do a DB fetch accordingly)
  4. The speech to text would be activated and wait for the word to be uttered [obvious problems with this, more on this later]
  5. Some indication of whether they got it right [some sound maybe?]

That's the upshot of it but can be easily extended. However I wil mention some problems with the speech to text that should be carefully considered

It is quite possible that the VOSK model will struggle to accurately identify the word, so if the word is not correct, then maybe see if the first x characters are correct? Like if the word to be uttered is 'red' but VOSK identifies is as 'read' [which is very likely as it stands..], we can accept this answer since the first two characters match. This isn't ideal but could be ok as a starting implementation. A feature that could be quite advanced is to make the user utter the words themselves before they are saved to the word bank. That way the speech would be compared against the user's spoken words. This may require looking into a different library that compares two audio sources and their similarity, but this could be pretty cool if executed, and could just about give us some more exceptionality marks. What are everyone's thoughts on this?

As for the database, it doesn't have to be anything flashy and could just be a SQLite database, which is just a file.

About the parameters for get_random_word, we could maybe add some weightings, such as how much to favour words that the user has gotten right consistently (would require storing that sort of stuff in the DB), character length etc

poppy-io commented 7 months ago

is there a way to test vosk's confidence in a specific phrase matching the voice input? that kind of seems like the ideal solution here but im not familiar with the api