Ant-Brain / EfficientWord-Net

OneShot Learning-based hotword detection.
https://ant-brain.github.io/EfficientWord-Net/
Apache License 2.0
215 stars 34 forks source link

Hotword matches without any utterance #23

Closed OnlinePage closed 1 year ago

OnlinePage commented 1 year ago

Hi, first of all thanks for making this library, Its fantastic!! I understand well since its in very early phase so it will have some issue and eventually it will better. So this time I was trying to go with the given example of hotword detector, I tried to attach a speech recognition after hotword triggers, but the performance is quite messy , to demonstrate this I am including this gif.

Code_nyP9cJMBBg

Problem1: Basically what happening is I am trying to call speech recognition right after there is match, as the speech recognition ends it again shows hotword uttered and re listen, even though there no hotword uttered and with confidence.

Problem2: Also in some situations it matches when there is little click or desk sound.

any fix for at least for Problem 1 I see problem 2 could be the reason of weak training as depending upon the hotword.

aman-17 commented 1 year ago

Thanks for reaching out with the issue. Problem 2 might be because of the window frame(1 second) that was restricted for each word while training. This problem can be fixed by training the model with a higher amount of window frame(1.5 to 3 seconds). We will plan this launch and come up with the next version of EfficientWordNet.

OnlinePage commented 1 year ago

And what about Problem-1 , why does it matches consecutively after speech recognition without even utterance?🤔

TheSeriousProgrammer commented 1 year ago

Can you consider creating custom wakeword with your voice and test the same?

OnlinePage commented 1 year ago

Can you consider creating custom wakeword with your voice and test the same?

Okay so I did as u said , I created 9 voice sample of wakeword of 1 sec in audacity then re trained and created the .ref file but the results are same.

Is it related to the speech recognition module ? or due to the mic being used also in speech recognition? Kindly provide a solution to it

for correct understanding of scenario here is the sample code with speech recognition called after the matched utterance.

` import os from eff_word_net.streams import SimpleMicStream from eff_word_net.engine import HotwordDetector from eff_word_net import samples_loc import speech_recognition

recognizer = speech_recognition.Recognizer()
mycroft_hw = HotwordDetector(
        hotword="Mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

def listen():
    with speech_recognition.Microphone(device_index=0) as source:
        print('i m hearing !')
        recognizer.adjust_for_ambient_noise(source)
        try:
            audio = recognizer.listen(
                source=source, timeout=2, phrase_time_limit=8)
        except speech_recognition.WaitTimeoutError:
            pass
        try:
            text = recognizer.recognize_google(audio)
            if text is not None:
                return text
            else:
                return None
        except:
            return None

print("Say Mycroft ")
while True :
    print("say again:")
    frame = mic_stream.getFrame()
    result = mycroft_hw.scoreFrame(frame)
    if result==None :
        #no voice activity
        continue
    if(result["match"]):
        print("Wakeword uttered",result["confidence"])
        listen()#this speech recognition function defined in app.py

`

3sticks commented 1 year ago

I was able to get around this issue by wrapping the wake word in a function, creating a runner variable that you can make false (for the while true) while your speech recognition is running. then when the speech is done, you just call the wake word function again.

 def get_audio():
    r = sr.Recognizer()
    with sr.Microphone() as source:
        audio = r.listen(source)
        said = ""

        try:
            said = r.recognize_google(audio)
            # add in check in case nothing was picked up
            if said == "":
                said = "nothing"
                print(said)
        except Exception as e:
            said = "nothing"
            print("Exception: " + str(e))
        print("said" + said)
        return said

def wakeup():
    office_hw = HotwordDetector(
            hotword="office",
            reference_file = "/home/pi/office_ref.json", threshold=0.9)

    mic_stream = SimpleMicStream()
    mic_stream.start_stream()

    print("Listening ")
    runner = True
    while runner :
        frame = mic_stream.getFrame()
        result = office_hw scoreFrame(frame)
        if result==None :
            #no voice activity
            continue
        if(result["match"]):
            print("Wakeword uttered",result["confidence"])
            runner = False #make false, so the wake word stops listening 
            query = get_audio()
            #do whatever your going to do with your audio, turn your lights on 
            wakeup()

#start the infinite loop 
wakeup()
OnlinePage commented 1 year ago

@3sticks thanks finally I figured it out from your example what actually was causing the re utterance matches, so basically this was due to the reason in my case mic_stream was defined outside the function at very first , so even though there's is a bool condition it wont stop , as mic_stream is running in global scope and concurrently picking up audio at the same time but in your case the mic_stream is being defined every time the function has been called as fresh instance and that what resolves 96% , but in due case using the condition in the loop is safe practice! Thanks mate , now working fine!

IamproudofyouMyHeroGIF

TheSeriousProgrammer commented 1 year ago

Glad to hear that the issue is resolved!! Will have to add information about this scenario in the readme though!

OnlinePage commented 1 year ago

Yep that's how we mature! btw this repo has lot of potential, kind of next snow-boy! So Goodluck mate, we will be sharing insights to make it better.👍

TheSeriousProgrammer commented 1 year ago

@OnlinePage It would be helpful if you could share the updated inference code which had the issues resolved, I would need to see it to make api related changes

OnlinePage commented 1 year ago

@TheSeriousProgrammer Ahh that's exactly same as @3sticks has mentioned above! And I had already explained above too the cause with strategy applied in the fix. Also current version works well if there is no noise, but if there is constant noise and few people are talking together then it cause matches even though the hotword is not uttered.

Waiting very eagerly for the improvised version of this repo. JuijuWaitingGIF

TheSeriousProgrammer commented 1 year ago

We are about to release a resnet based model trained on a better dataset, will soon give updates on a release with better pytorch training loop

OnlinePage commented 1 year ago

@TheSeriousProgrammer waiting very very eagerly!!!!