Walleclipse / Deep_Speaker-speaker_recognition_system

Keras implementation of ‘’Deep Speaker: an End-to-End Neural Speaker Embedding System‘’ (speaker recognition)
245 stars 81 forks source link

How to identify the speaker by inferencing on trained model #63

Closed lokesh16191 closed 2 years ago

lokesh16191 commented 4 years ago

I am not able to identify the single user by inferencing on trained model.

`model = convolutional_model()
    gru_model = None
    last_checkpoint = get_last_checkpoint_if_any(c.CHECKPOINT_FOLDER)
    if last_checkpoint is not None:
        print('Found checkpoint [{}]. Resume from here...'.format(last_checkpoint))
        model.load_weights(last_checkpoint)
    if c.COMBINE_MODEL:
        gru_model = recurrent_model()
        last_checkpoint = get_last_checkpoint_if_any(c.GRU_CHECKPOINT_FOLDER)
        if last_checkpoint is not None:
            print('Found checkpoint [{}]. Resume from here...'.format(last_checkpoint))
            gru_model.load_weights(last_checkpoint)
ls=[]
ls.append("audio/LibriSpeechSamples/train-clean-100/000/1085857/1582716560943.wav")

test_dir= c.TEST_DIR
print("Test dir : ",test_dir)
check_partial=True
x, y_true = create_test_data(test_dir,check_partial)
x, y_true = create_test_data(test_dir,check_partial)
batch_size = x.shape[0]
b = x[0]
num_frames = b.shape[0]
input_shape = (num_frames, b.shape[1], b.shape[2])

print('test_data:')
print('num_frames = {}'.format(num_frames))
print('batch size: {}'.format(batch_size))
print('input shape: {}'.format(input_shape))
print('x.shape before reshape: {}'.format(x.shape))
print('x.shape after  reshape: {}'.format(x.shape))
print('y.shape: {}'.format(y_true.shape))

for file in ls:
    raw_audio = read_audio(file)
    feat1 = extract_features(raw_audio,target_sample_rate=16000)
    feat1 = clipped_audio(feat1)
    feat1 = feat1[np.newaxis, ...]
    emb1 = model.predict(feat1)
    embedding=emb1.copy()
    print("Embedding Shape",embedding.shape[0])
    y_pred = call_similar(embedding)
    print("Length Y_Pred",len(y_pred))
    nrof_pairs = min(len(embedding), len(y_true))
    y_pred = y_pred[:nrof_pairs]
    y_true = y_true[:nrof_pairs]

    fm, tpr, acc, eer = evaluate(y_pred, y_true)
    print("f-measure = {0}, true positive rate = {1}, accuracy = {2}, equal error rate = {3}".format(fm, tpr, acc, eer))`

Request you to please suggest, any reference code

Walleclipse commented 4 years ago

Hi, The goal of the project is speaker embedding. Then with the speaker embedding, we can do speaker verification (Verify whether the two utterances are from the same person). However, the project can not identify the single user (single user identifying is classification problem). If you want to do inference with the pre-trained model, please check issue 30