Closed LongMingWei closed 4 months ago
@LongMingWei - can you reproduce this issue by saving and loading the model from your hard drive? If it is only present when you are using MongoDB, then that would probably not be an issue with Eagle, but more likely an issue with the database code.
This is my new test script without using MongoDB:
` import pveagle from pveagle import EagleProfile from pvrecorder import PvRecorder
access_key = "hidden"
eagle_profiler = pveagle.create_profiler(access_key=access_key) recorder = PvRecorder(device_index=-1, frame_length=eagle_profiler.min_enroll_samples) recorder.start() enroll_percentage = 0.0 while enroll_percentage < 100.0: audio_frame = recorder.read() enroll_percentage, feedback = eagle_profiler.enroll(audio_frame) print(enroll_percentage) recorder.stop()
speaker_profile = eagle_profiler.export() voicebyte = speaker_profile.to_bytes() eagle_profiler.delete() recorder.delete()
speaker_profile = EagleProfile.from_bytes(voicebyte) eagle = pveagle.create_recognizer(access_key=access_key, speaker_profiles=[speaker_profile]) recognizer_recorder = PvRecorder(device_index=-1, frame_length=eagle.frame_length) recognizer_recorder.start()
for i in range(1000): audio_frame = recognizer_recorder.read() print(eagle.process(audio_frame)[0]) recognizer_recorder.stop() eagle.delete() recognizer_recorder.delete()
`
When recognizing my voice it was successfully detected at first, however as time went by the score went back to 0 even though I was still speaking. May I also check if there's a way for the profiler to create a new voice profile faster while still being accurate as well since the time to profile a voice is currently quite long. Thanks for the help!
Thank you, @LongMingWei, for providing the sample code. We ran it but were unable to reproduce the issue you described. Eagle continues to send meaningful scores even after several minutes.
If you're observing unusual behavior in the results, it may be due to the enrollment process. For testing, I suggest recording your voice using a different application, saving the audio file, and listening to it to ensure the quality is acceptable. Then, pass this file directly to the profiler instead of using pv_recorder
and see if there is any difference.
Thanks for prompt help given so far! There's a problem I couldn't figure out however. When I try to directly pass in the audio file into eagle_profiler.enroll(audiofile), there is a TypeError: only integer scalar arrays can be converted to a scalar index. I checked the documentation and apparently pvrecorder.read() converts the audio file into numbers. However, I couldn't find in the documentation how to directly convert this audiofile into this list of numbers that the eagle profiler can accept. Thank you!
As mentioned in the documentation, the enroll
function accepts PCM signal only. You'll need to read the audio file using another library and then pass the PCM data to the Eagle profiler.
For an example, please take a look at this Eagle demo in which wave files are read and passed to the enroll
function. This demonstrates how to properly extract and format PCM data for use with the Eagle profiler.
Hello! I managed to figure out how to convert the audio data into PCM data to enroll the voice, however something very weird happened. The results are very inconsistent, as one time I recorded my voice and it could detect it, but sometimes when I was not speaking it still detected as my voice. There was also another time when the old problem appeared again where the scores would be 0 throughout. I did try to downsample the audio file to match the required sample rate of pvrecorder, do you think that's the problem or it's another problem that I did not realise? Thank you!
`
def register_user(name, voice, image): #Function to register new user including voice
if not name:
return "Name field empty."
elif name in user_embeddings:
return "Name already taken."
if not voice:
return "Voice profile missing."
img = np.array(image)
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
embeddings = face_recognition.face_encodings(img_rgb)
# Anti-Spoofing
label = test(
image=img_rgb,
model_dir='./Silent-Face-Anti-Spoofing/resources/anti_spoof_models',
device_id=0
)
if len(embeddings) == 0:
return "No face detected in the image."
elif label != 1:
return "Face is fake. Registration denied."
embedding = embeddings[0]
eagle_profiler = pveagle.create_profiler(access_key=access_key)
audio = read_file(voice, eagle_profiler.sample_rate)
enroll_percentage, feedback = eagle_profiler.enroll(audio)
if enroll_percentage < 100.0:
return "Unable to register voice profile. Speak for a longer period of time."
else:
speaker_profile = eagle_profiler.export()
document = {
'username': name,
'embedding': embedding.tolist(), # Convert to list for MongoDB storage,
'voice_profile': speaker_profile.to_bytes()
}
collection.insert_one(document)
user_embeddings[name] = {} # Add new data entry to memory
user_embeddings[name]['face'] = embedding
user_embeddings[name]['speaker'] = speaker_profile
eagle_profiler.delete()
return f"User {name} registered successfully!"`
`def read_file(file_name, sample_rate): # Function used in the register user function
rate_match = True
with wave.open(file_name, mode="rb") as wav_file:
channels = wav_file.getnchannels()
sample_width = wav_file.getsampwidth()
num_frames = wav_file.getnframes()
params = wav_file.getparams()
params = list(params)
if wav_file.getframerate() != sample_rate:
params[3] = sample_rate
print(params)
print("Audio file should have a sample rate of %d. got %d" % (sample_rate, wav_file.getframerate()))
if sample_width != 2:
raise ValueError("Audio file should be 16-bit. got %d" % sample_width)
if channels == 2:
print("Eagle processes single-channel audio but stereo file is provided. Processing left channel only.")
if not rate_match:
with wave.open(file_name, mode="wb") as wav_file:
wav_file.setparams(params)
with wave.open(file_name, mode="rb") as wav_file:
samples = wav_file.readframes(num_frames)
frames = struct.unpack('h' * num_frames * channels, samples)
return frames[::channels]`
It seems that saving and loading the speaker profile is working correctly then. Your question seems to be more concerned with Eagle's performance. Could you please try our demos to see if you notice the same unusual behavior in scoring?
Hello! Seems like the same unusual behavior in scoring has happened even when using the demo for the mic, with scores remaining stuck at 0. Also when enrolling, sometimes even when I'm not speaking in a quiet area, it still detects as good audio. I also noticed there were 2 detected mics in my device when using show audio devices, but I could not find any documentation in the demo page and help command that shows how to change the mic used for enrolling and testing, that could be one of the possible problems. Thank you!
As we haven't observed the behavior you're describing on our end, we suspect there may be a configuration issue with your setup. We cannot assist you further unless you provide sample audio files for both enrollment and verification.
Regarding your demo questions, each demo includes a help option accessible with the --help
flag, which provides comprehensive explanation on each flag. Additionally, detailed explanations can be found in the accompanying readme file. Please ensure you use all these resources before asking your questions here.
To specify the recording device, use the --audio_device_index
option. To verify audio input to Eagle, use the --output_audio_path
option in the mic demo to save recorded audio to a file for playback and analysis.
Hello! After testing, seems like the problem was indeed the audio_device_index, thank you! Just fyi, a lot of the functions like --audio_device_index and --output_audio_path are not mentioned in this demo readme and using --help also gave very little info as shown below, so maybe if there was any info I have missed you can add that info in the readme docs so that people can access it easier.
Thanks for letting us know, we will update the docs later.
Have you checked the docs and existing issues?
SDK
Python
Eagle package version
1.0.1
Framework version
Python 3.12
Platform
Windows (x86_64)
OS/Browser version
Windows 11
Describe the bug
I'm making a face recognition system with voice verification as well. It has a recognize user, register user and delete user function, where register user collects a new face and voice profile, and recognize user verifies the face and voice and see if they match.
I'm trying to store exported speaker profiles in a MongoDB database as binary data as it cannot accept Eagle Profiles as data. However, when I convert the binary data from MongoDB back into an Eagle speaker profile for comparison with input data when the software is running, the scores from Eagle's process function are at 0 most of the time. The main code is shown below. Any help will be appreciated!
`from pveagle import EagleProfile from pymongo.mongo_client import MongoClient from pymongo.server_api import ServerApi import cv2 import numpy as np import face_recognition from PIL import Image import gradio as gr from deepface import DeepFace import pveagle from pvrecorder import PvRecorder import sys import time
sys.path.append("Silent-Face-Anti-Spoofing") from test1 import test # from test1.py in the Silent-Face-Anti-Spoofing folder
gr.themes.builder()
uri = "hidden" client = MongoClient(uri, server_api=ServerApi('1')) collection = client['users']['faces'] collection.create_index("username", unique=True)
Load all embeddings into memory
user_embeddings = {}
def load_embeddings(): global user_embeddings user_embeddings = {} for user in collection.find(): stats = {} username = user['username'] embedding = np.array(user['embedding']) voice_profile = user.get('voice_profile', None) if voice_profile: voice_profile = EagleProfile.from_bytes(voice_profile) stats['face'] = embedding stats['speaker'] = voice_profile user_embeddings[username] = stats print(username)
load_embeddings()
access_key = "hidden"
state = 1 usersname=""
def recognize_user(image): global state, usersname if state == 1: img = np.array(image) img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
rstate = 1 rname = "" rembed = None
def register_user(name, image): global rstate, rname, rembed if rstate == 1: if not name: return "Name field empty."
def delete_user(name): if not name: return "Name field empty."
def recognize(img): embeddings_unknown = face_recognition.face_encodings(img) if len(embeddings_unknown) == 0: return 'no_persons_found' else: embeddings_unknown = embeddings_unknown[0]
Frontend interface
theme = gr.themes.Soft( primary_hue="slate", secondary_hue="slate", font=[gr.themes.GoogleFont('Roboto'), gr.themes.GoogleFont('Montserrat'), 'ui-sans-serif', 'sans-serif'] )
iface_recognize = gr.Interface( fn=recognize_user, inputs=[gr.Image(source="webcam", streaming=True)], outputs=[gr.HTML()], live=True, every=1, title="Face Recognition Attendance System", allow_flagging='never', clear_btn=None )
iface_register = gr.Interface( fn=register_user, inputs=[gr.Textbox(label="Enter new user name."), gr.Image(source="webcam", streaming=True)], outputs=[gr.HTML()], title="Register New User", live=False, allow_flagging='never', clear_btn=None )
iface_delete = gr.Interface( fn=delete_user, inputs=[gr.Textbox(label="Enter user name to delete")], outputs=[gr.HTML()], title="Delete User", live=False, allow_flagging='never', clear_btn=None )
custom_css = """ footer {display: none !important;} label.float {display: none !important;} div.stretch button.secondary {display: none !important;} .panel .pending {opacity: 1 !important;} """
iface = gr.TabbedInterface([iface_recognize, iface_register, iface_delete], ["Recognize User", "Register User", "Delete User"], css=custom_css, theme=theme)
if name == "main": iface_recognize.dependencies[0]["show_progress"] = False iface_register.dependencies[0]["show_progress"] = False iface_delete.dependencies[0]["show_progress"] = False iface.launch()`
Steps To Reproduce
Just look at the code to see if there's anything wrong first or there is something about MongoDB or Eagle that I am unaware about. If the issue can't be solved, I can send the repo used.
Expected Behavior
High scores returned from Eagle's process function when voices from the same user are being compared