Open GautamR-Samagra opened 9 months ago
Nitin: Extensive on ground testing required. On different paragraphs. For example - Motorcycle. What if only motor is said?
100 words by 30 students each would be a good sample set. Also check noisy background. Also check if the student is not speaking the paragraph at all. We can also create sample audio with noise on the samples we collect.
Simple words - 2 letter words (Haan, Ha, Aam, and ek, do, teen etc) would be more difficult. Lets check on that.
Also check numbers. If the model is able to read this.
Feedback from Rahul: Allow accessor to map a word as wrong or right manually.
Next steps :
[x] Get an accuracy score on noisy datasets/numeric datasets - @GautamR-Samagra Time required - 1 day
[ ] Figure out engg pipline to fine-tune a model given perfect data - @GautamR-Samagra - 2 days
[ ] Figure out restricting output of the model to a selected set of words @GautamR-Samagra - 2 days
[ ] Noise issues :
[ ] Clean current data to be useful for training @GautamR-Samagra 1 day
[ ] Conversion required to use trained python model on Android - Charanpreet
[ ] Create a keyword spotting model for Hindi ( another different approach altogether, not required if above work)
@karntrehan @rohitsamagra
Can we just have the PT put al the audio recordings into one folder on gdrive?
They can use Excel to maintain what the transcript is for each of these file names in that folder. They can just keep adding to that folder as they get new wav files and keep updating the Excel.
@charanpreet-s
Have created a collab for converting all formats to wav and base64 and saving them in that format.
Also transcribing all the audios using Conformer (Bhashini) to get better quality transcribed output. The provided 'answers' do not match the audio directly as the student repeats the words multiple times often to get it right.
New sample transcripts look like this - base64_and_transcripts.xlsx
@charanpreet-s @rohitsamagra Have created a folder where they can upload the files here
Have uploaded all provided files already there.
Have created another folder here which has the wav files in the required format for Vosk. (this is updated by my code) You can use these files for easy testing if required.
For current files, have updated accuracy in this sheet here
Collab to use vosk on wav files and get accuracy is here
To Do: https://docs.google.com/document/d/1fTSatDtD1sGI_YPChHw3iO7rsTDboBH8R_OHLC_AsvQ/edit
Phase 1:
Phase 2:
Phase 3:
Phase 4: