Evaluate VOSK to run offline language assessments on mobile

GautamR-Samagra commented 9 months ago

To Do: https://docs.google.com/document/d/1fTSatDtD1sGI_YPChHw3iO7rsTDboBH8R_OHLC_AsvQ/edit

Phase 1:

[x] Evaluate VOSK's vosk-model-small-hi-0.22 on Android

Phase 2:

[x] Check accuracy of above model
[x] Mark each assessment with UUID and display to PT
[x] Allow PT to send incorrect identifications to BE
[x] Upload all audio recordings with the UUID

Phase 3:

[ ] Configure assessments from BE

Phase 4:

[ ] Integration in NL

karntrehan commented 9 months ago

Nitin: Extensive on ground testing required. On different paragraphs. For example - Motorcycle. What if only motor is said?

100 words by 30 students each would be a good sample set. Also check noisy background. Also check if the student is not speaking the paragraph at all. We can also create sample audio with noise on the samples we collect.

Simple words - 2 letter words (Haan, Ha, Aam, and ek, do, teen etc) would be more difficult. Lets check on that.

Also check numbers. If the model is able to read this.

karntrehan commented 7 months ago

Feedback from Rahul: Allow accessor to map a word as wrong or right manually.

karntrehan commented 6 months ago

https://docs.google.com/spreadsheets/d/12KYbJSaZ6e1HDJvh3DRGq7E0f4tu8e42CP8CmwRxrsI/edit?usp=sharing to be used.

GautamR-Samagra commented 6 months ago

Next steps :

[x] Get an accuracy score on noisy datasets/numeric datasets - @GautamR-Samagra Time required - 1 day
[ ] Figure out engg pipline to fine-tune a model given perfect data - @GautamR-Samagra - 2 days
[ ] Figure out restricting output of the model to a selected set of words @GautamR-Samagra - 2 days
[ ] Noise issues :
- [ ] Use a different recorder that cleans out noise by itself - Charanpreet
- [ ] Train on noisier datasets - 1 day
[ ] Clean current data to be useful for training @GautamR-Samagra 1 day
[ ] Conversion required to use trained python model on Android - Charanpreet
[ ] Create a keyword spotting model for Hindi ( another different approach altogether, not required if above work)

GautamR-Samagra commented 6 months ago

@karntrehan @rohitsamagra
Can we just have the PT put al the audio recordings into one folder on gdrive? They can use Excel to maintain what the transcript is for each of these file names in that folder. They can just keep adding to that folder as they get new wav files and keep updating the Excel.

@charanpreet-s Have created a collab for converting all formats to wav and base64 and saving them in that format.
Also transcribing all the audios using Conformer (Bhashini) to get better quality transcribed output. The provided 'answers' do not match the audio directly as the student repeats the words multiple times often to get it right.

New sample transcripts look like this - base64_and_transcripts.xlsx

GautamR-Samagra commented 6 months ago

@charanpreet-s @rohitsamagra Have created a folder where they can upload the files here

Have uploaded all provided files already there.

Have created another folder here which has the wav files in the required format for Vosk. (this is updated by my code) You can use these files for easy testing if required.

For current files, have updated accuracy in this sheet here

Collab to use vosk on wav files and get accuracy is here

SamagraX-Labs / poc-tracker

Evaluate VOSK to run offline language assessments on mobile #9