Hi all,
I'm looking for partners for working on this exciting project.
The project vision:
Creating a simple to use voice control and macro creation system (similar to https://voiceattack.com ),
yet with high performance and robust that can accurately translate speech to transcribe command and correctly match it to a command from a list of available commands, that were defined by the system user.
TextProcessor.py - preprocessing steps on training data, commands for the bank commands and the transcribed input command
the steps that are included:
1) case folding,
2) dividing text to sentences
3) punctuation removal
4) stop words removal
5) lemmatiztion
6) tokanization.
all steps are optional
PostProcessing.py - post processing steps on the processed input command,
in case the returned command similarity from the bank command is not similar enough, which may indicate that there was and error in the transcribe process. e.g: speech ground truth is : "collect evidence" while the prediction was "correct evidence".
if this is the case then swapping words in the wrong processed command with the those that are most similar to them by sound from the commands bank might help in retrieving the correct command with high confidence.
this class will include implementation of the soundex and methaphone algorithms(was implemented).
a further work needs to be done for replacing words from a given input command with the matching words from the commands bank
DataLoader.py - read and write the formatted commands and their variations and associated macros.
it also manipulate the formatted data.
a further work needs to be done to enable editing of the formatted commands.
TrainDataManager.py - generate train data for bert model for sequence classification
sentence embedding module - a module of models for generating vectors embedding from the bank of commands and the transcribed input command in order to apply similarity metric(currently using cosine sim) on them in order to find the most similar command from the bank of commands defined by the user to the given input command.
this module contain 2 main models: word2vec and bert for sentence embedding. so far bert seems to yield better results but further training is necessary as the predefined model of the Transformers library doesn't understancd that specific words from a certain game terminology are similar. for example in the case of ready or not the vanilla model cosine sim for flashbang and stun grenade is: 0.58927554, although those are the same thing.
a further fine tuning needs to be done on.
My current approach is to train the bert model on a sentence pair classification task, where it will receive as input
pairs of command variation and a label that indicate if they are variation of the same command or not.
after the training the hope is that the model will learn the similarity between such terms for example and will hold better performances in embedding the commands.
Hi all, I'm looking for partners for working on this exciting project. The project vision: Creating a simple to use voice control and macro creation system (similar to https://voiceattack.com ), yet with high performance and robust that can accurately translate speech to transcribe command and correctly match it to a command from a list of available commands, that were defined by the system user.
current status of the project:
a speech to text module is handling of producing a text input command(https://github.com/AvivLugasi/voice_commands/tree/master/SpeachToText) as follows: AudioCapture.py - read mic input and produce a audio file. SpeachToText.py - transcribe the audio file to a text
a text processing module perform various of processing on the text data(https://github.com/AvivLugasi/voice_commands/tree/master/TextProcessing):
TextProcessor.py - preprocessing steps on training data, commands for the bank commands and the transcribed input command the steps that are included: 1) case folding, 2) dividing text to sentences 3) punctuation removal 4) stop words removal 5) lemmatiztion 6) tokanization. all steps are optional
a further work needs to be done for replacing words from a given input command with the matching words from the commands bank
Data Handling module handle the writing, reading and generating of formatted commands and data for sentence embedding models. (https://github.com/AvivLugasi/voice_commands/tree/master/DataHandling)
DataLoader.py - read and write the formatted commands and their variations and associated macros. it also manipulate the formatted data.
a further work needs to be done to enable editing of the formatted commands.
TrainDataManager.py - generate train data for bert model for sequence classification
sentence embedding module - a module of models for generating vectors embedding from the bank of commands and the transcribed input command in order to apply similarity metric(currently using cosine sim) on them in order to find the most similar command from the bank of commands defined by the user to the given input command.
this module contain 2 main models: word2vec and bert for sentence embedding. so far bert seems to yield better results but further training is necessary as the predefined model of the Transformers library doesn't understancd that specific words from a certain game terminology are similar. for example in the case of ready or not the vanilla model cosine sim for flashbang and stun grenade is: 0.58927554, although those are the same thing. a further fine tuning needs to be done on. My current approach is to train the bert model on a sentence pair classification task, where it will receive as input pairs of command variation and a label that indicate if they are variation of the same command or not. after the training the hope is that the model will learn the similarity between such terms for example and will hold better performances in embedding the commands.