MycroftAI / mycroft-precise

A lightweight, simple-to-use, RNN wake word listener
Apache License 2.0
818 stars 228 forks source link

Creating a custom frequency filter #215

Open NewBermuda opened 2 years ago

NewBermuda commented 2 years ago

I am currently working on a branch of precise-engine, where only frequencies between 500 Hz and 3,500 Hz are processed by the engine. Therefore, I would have to remove these frequencies from the MFCCs before they are used both training the model and the precise engine during runtime. I'm trying to understood which lines of the code I would have to modify to achieve this goal. It'd be grateful if you could point me into the right direction.

If I understand correctly, I'd have to remove the other frequencies before processing the MFCCs. How would you go about removing the frequencies from the raw audio? Should I modify the "vectorize" method from vectorization.py?

def vectorize(audio: np.ndarray) -> np.ndarray:    
# Converts audio to machine readable vectors using
# configuration specified in ListenerParams (params.py)
# Args:
#    audio: Audio verified to be of `sample_rate`
# Returns:
#     array<float>: Vector representation of audio

if len(audio) > pr.max_samples:
    audio = audio[-pr.max_samples:]
features = vectorize_raw(audio)
if len(features) < pr.n_features:
    features = np.concatenate([
        np.zeros((pr.n_features - len(features), features.shape[1])),
        features
    ])
if len(features) > pr.n_features:
    features = features[-pr.n_features:]

return features

How can I remove the frequencies from the "features"-vectors?

el-tocino commented 2 years ago

pydub can handle the raw audio, then do something like..?

    sound = sound.high_pass_filter(int(freq))
    sound = sound.low_pass_filter(int(freq))
NewBermuda commented 2 years ago

Thanks for your input!

pydub can handle the raw audio, then do something like..?

    sound = sound.high_pass_filter(int(freq))
    sound = sound.low_pass_filter(int(freq))

However, I am really struggling to understand which chunks of audio I need to modify in which class. I can't seem to import the chunks simply using PyDub. Do I need to use the filter on the audio window or each chunk? What is the correct command to load them and then save them again using pydub?