SuperKogito / Voice-based-gender-recognition

:sound: :boy: :girl:Voice based gender recognition using Mel-frequency cepstrum coefficients (MFCC) and Gaussian mixture models (GMM)
MIT License
197 stars 66 forks source link

Real-time gender classification? #4

Closed ghost closed 4 years ago

ghost commented 4 years ago

I see the python files rely on files for input.

Can I pipe a .wav file into a python script?

I'm exploring possibilities for classifying a voice in real time.

I got the program working fine. Very nice code. Thank you.

SuperKogito commented 4 years ago

Sure you can do that by using GenderIdentifier class but you have to slightly edit the FeaturesExtractor class function. Assuming you have recorded some speech in real-time and formatted it as an audio array (mono). You can compute the features characterising the audio array and later you can use the features vector in the prediction. This can be done like the following:

import numpy as np
from sklearn import preprocessing
from scipy.io.wavfile import read
from python_speech_features import mfcc
from python_speech_features import delta
from GenderIdentifier import GenderIdentifier

def extract_features(audio, rate):
        mfcc_feature = mfcc(# The audio signal from which to compute features.
                            audio,
                            # The samplerate of the signal we are working with.
                            rate,
                            # The length of the analysis window in seconds. 
                            # Default is 0.025s (25 milliseconds)
                            winlen       = 0.05,
                            # The step between successive windows in seconds. 
                            # Default is 0.01s (10 milliseconds)
                            winstep      = 0.01,
                            # The number of cepstrum to return. 
                            # Default 13.
                            numcep       = 13,
                            # The number of filters in the filterbank.
                            # Default is 26.
                            nfilt        = 30,
                            # The FFT size. Default is 512.
                            nfft         = 1024,
                            # If true, the zeroth cepstral coefficient is replaced 
                            # with the log of the total frame energy.
                            appendEnergy = True)

        mfcc_feature  = preprocessing.scale(mfcc_feature)
        deltas        = delta(mfcc_feature, 2)
        double_deltas = delta(deltas, 2)
        combined      = np.hstack((mfcc_feature, deltas, double_deltas))
        return combined

# init gender identifier
gender_identifier = GenderIdentifier("TestingData/females", 
                                     "TestingData/males", 
                                     "females.svm", "males.svm")
# get audio features vector
features_vector = extract_features(audio=recorded_audio_data, rate=sampling_rate)

# predict/identify speaker's gender
predicted_gender = gender_identifier.identify_gender(features_vector)

As for how to get the recorded audio, I suggest following something like this. A few things to mind here are:

Hope this helps :)

ghost commented 4 years ago

This is hugely helpful. Thank you for the code snippet and the link. I'll let you know how I get on with the real-time piping.

I'll try and match the sampling rate as best I can and see what I can come up with.

thxrgxxs commented 4 years ago

error hi, i am currently working on my second phase of experiment using your source code. thank you :) i just have one doubt: the audio and rate in features vector, im having troubles in sampling_rate when i tried to substitute it with 44100. can you please tell me where im going wrong?

SuperKogito commented 4 years ago

Since OP did not follow on this, I will assume that his issue was solved and close this one. @thxrgxxs please refer to #7.