Improve Speech-To-Text (Rules-Based Approach)

chidiewenike commented 4 years ago

Objective

The DeepSpeech Speech-To-Text system needs to be improved to handle uncommon & non-English words. The rules-based approach is to inspect the output of the current DeepSpeech model and create a mapping of transcribed audio to the actual expected output. It will look-up all the substrings of the transcribed text to search for potential errors when the text is transcribed and replace those substrings in the event of errors. You can store these mappings in a JSON.

Key Result

Using the run_stt function of stream_deepspeech.py, return a string of audio input that is correctly transcribed. https://github.com/calpoly-csai/swanton/blob/b8e55023e9c12af9dabd8050166fd8cdb8860e91/stream_deepspeech.py#L16

Example

Expected Transcription: "What is Casa Verde used for?" => DeepSpeech transcription: "What is cause uh very day used for?"

mapping={
    "cause uh very day" : "Casa Verde"
}
transcription_substring = "cause uh very day"
print(mapping[transcription_substring ]) # Output: Casa Verde

Details

Correctly transcribe all QA pairs from the question-answer pairs Google Sheet.

You will need the following DeepSpeech model and DeepSpeech scorer to use run_stt. Ensure that these files are in the same directory as the stream_deepspeech.py program.

If in need of assistance, please ask @chidiewenike

chidiewenike commented 4 years ago

Algorithms to consider in the future:

Levenshtein Distances/Fuzzy Matching

chidiewenike commented 4 years ago

Storing JSON from Python Dict

import json

mapper = {
        "ramona roderigo":"Ramon Rodriguez"
}

with open("test_json.json", "w") as in_json:
    json.dump(mapper, in_json)

print(mapper["ramona roderigo"]) # Output => Ramon Rodriguez

Pseudo-Python for Substring Mapper

from stream_deepspeech import *

def stt_mapper(predicted):
    mapper = {
    "romona roderigo" : "ramon rodriguez"            
}

    for substring in predicted:
        if substring in mapper:
            predicted.replace(substring, mapper[predicted])

    return predicted

predicted = run_stt(5)
# predicted => "romona roderigo is an operator at the ranch"
correct = stt_mapper(predicted)
# correct => "ramon rodriguez is an operator at the ranch"

chidiewenike commented 3 years ago

@akimminavarro @taylor-nguyen-987 Can you folks try out Swanton, Swanton Pacific, and Swanton Pacific Ranch? Maybe start with those?

calpoly-csai / swanton