Ant-Brain / EfficientWord-Net

OneShot Learning-based hotword detection.
https://ant-brain.github.io/EfficientWord-Net/
Apache License 2.0
215 stars 34 forks source link

Models path is incorrect or not included #28

Closed OnlinePage closed 1 year ago

OnlinePage commented 1 year ago

hi , I tested the newer model 1.0.1, it seems like the newer model Resnet_50_Arcloss is not included in the models directory and also the audio_processing.py has incorrect path with wrong slash used. Do look into it! Below are the some reference to above error

ERROR TRACE: ` onnxruntime.capi.onnxruntime_pybind11_state.NoSuchFile: [ONNXRuntimeError] : 3 : NO_SUCHFILE : Load model from C:\Users\OnlinePage.conda\envs\myenv1\Lib\site- packages\eff_word_net\models/resnet_50_arc/slim_93%_accuracy_72.7390%.onnx failed:Load model C:\Users\OnlinePage.conda\envs\myenv1\Lib\site- packages\eff_word_net\models/resnet_50_arc/slim_93%_accuracy_72.7390%.onnx failed. File doesn't exist

`

Model is not there in the models directory: 😕 image

TheSeriousProgrammer commented 1 year ago

my bad, probably got missed in the pypi build will look into it, for now you can try it out through git clone

OnlinePage commented 1 year ago

my bad, probably got missed in the pypi build will look into it, for now you can try it out through git clone

Allright, i knew it!😄

OnlinePage commented 1 year ago

my bad, probably got missed in the pypi build will look into it, for now you can try it out through git clone

Okay this I fixed the model path issue by manually pasting the Resnet folder over there , but on initializing the HotwordDetector the below error comes up

File "C:\Users\OnlinePage\.conda\envs\myenv\lib\site-packages\eff_word_net\engine.py", line 64, in __init__ assert MODEL_TYPE_MAPPER[data["model_type"]]==type(model) KeyError: 'model_type'

But there no such key to be passed other than the newly included model Do look into it!

TheSeriousProgrammer commented 1 year ago

the ref file you currently using is of older version, consider regenerating them

OnlinePage commented 1 year ago

Okay fixed it by regenerating ref file! but now getting this issue as I continued with the example.

mic_stream = SimpleMicStream( TypeError: __init__() got an unexpected keyword argument 'window_length'

or if I remove the both window_length and sliding_window (cuz both args are causing error) and ran then below error comes up

\lib\site-packages\eff_word_net\audio_processing.py", line 199, in audioToVector assert inpAudio.shape == (self.window_frames, ) #1.5 sec long window AssertionError

TheSeriousProgrammer commented 1 year ago

That was a typo in the example , can you check README.md again

TheSeriousProgrammer commented 1 year ago

Model is included as well in the pip package now

OnlinePage commented 1 year ago

All right all done n working fine now!!😄👍 really nice job pals.

Feedback: 1> the noise handling in this version as per my current few test is really good or say 88% better than the last one 2>unnecessary detection(when wake word is not spoke) is handled very well as in the last version it was an issue that even though there is no hotword said it used to detect 3.> Multiple utterance is still there in

Here is the below video that show current multiple utterance issue again

https://user-images.githubusercontent.com/61045650/232137585-b4781f06-023a-4ee3-8bf0-49ad04f7a17d.mp4

But also at the same time this multiple utterance can be handled well with approach mentioned in the discussion we had earlier about it at https://github.com/Ant-Brain/EfficientWord-Net/issues/23#issuecomment-1399334947 So far now I am really glad that I came across this repo and really this is next snowboy!

TheSeriousProgrammer commented 1 year ago

Thanks for the kind words

The model is fundamentally a classifier which can only look for a specific window of data

In this case 1.5 second long audio

But if we split the audio simply into 1.5 second long chunks , there is a possibility where a hotword utterance gets caught exactly between 2 chunks, and when that happens hot word will not be detected.

To avoid this sliding window is implemented where the the extracted audio chunks will be overlapping

We now face the problem of multiple detections per utterance. when a single hot word utterance is present exactly in the overlapping area of 2 or more overlapping chunks

The sliding_window_secs parameter controls the degree of overlap

We are currently toying with different approaches to overcome the problem

one solution is to increase the relaxation time (I.e min time required between 2 utterances), you can toy around with this value in the constructor of your hotworddetector

TheSeriousProgrammer commented 1 year ago

I am currently closing this issue as the problems in this issue has been resolved