Open Manamama opened 5 days ago
Please try the funasr pipeline.
Yeah, that (second) one works out of the box. The first sample also works, but needs that fix.
BTW, I have tested both models over 24 hours.
This code may come in handy for some:
import argparse
from funasr import AutoModel
def visualize_emotion_scores(results):
# Define a mapping of emotion labels to emoticons
emotion_emoticons = {
'ηζ°/angry': 'π ',
'εζΆ/disgusted': 'π€’',
'ζζ§/fearful': 'π¨',
'εΌεΏ/happy': 'π',
'δΈη«/neutral': 'π',
'ε
Άδ»/other': 'π€·ββοΈ',
'ιΎθΏ/sad': 'π’',
'εζ/surprised': 'π²',
'<unk>': 'β'
}
print("\nEmotion Scores Visualization, version 3.1:")
# Assuming both models have the same labels
num_labels = len(results[0]['labels']) # Get number of labels from first model
# Iterate over labels
for i in range(num_labels):
label = results[0]['labels'][i] # Get the label from the first model
emoticon = emotion_emoticons.get(label, '') # Get emoticon for the label
print(f"\nLabel: {label} {emoticon}")
# Now iterate over each model to get scores for this label
for result in results:
model_name = result['model_name']
score = result['scores'][i]
trimmed_score = score # Use the score directly
# Scale to a range for visualization, modify the multiplier to make it longer:
scaled_score = int(round(trimmed_score * 100)) # Convert to integer for bar representation
# Create the bar visualization
bar = 'β' * scaled_score # Fill remaining with spaces
# Print model name, score details
print(f"Model: {model_name} - Raw Score: {score:.6f}, Trimmed Score: {trimmed_score:.6f}")
print(f"[{bar:<100}] {scaled_score:.2f}/100")
# Set up argument parsing
parser = argparse.ArgumentParser(description="Emotion recognition from audio files.")
parser.add_argument("audio_file", type=str, help="Path to the audio file")
args = parser.parse_args()
# Load models and generate results for each audio file
results = [] # Initialize an empty list to store results
# List of models to use
model_names = [
"iic/emotion2vec_plus_large",
"iic/emotion2vec_base_finetuned"
]
for model_name in model_names:
print("Model_name:", model_name)
# Load the emotion recognition model
model = AutoModel(model=model_name)
# Use the provided audio file for inference
rec_result = model.generate(args.audio_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)
# Add the model name to the results
rec_result_with_name = {
'model_name': model_name,
**rec_result[0] # Assuming rec_result is a list and we want to merge its first dictionary
}
# Append modified results to the results list
results.append(rec_result_with_name)
# Print out the modified results to check
for result in results:
print(result)
visualize_emotion_scores(results)
It produces:
Emotion Scores Visualization:
Label: ηζ°/angry π
Model: iic/emotion2vec_plus_large - Raw Score: 0.000001, Trimmed Score: 0.000001
[ ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.056474, Trimmed Score: 0.056474
[ββββββ ] 6.00/100
Label: εζΆ/disgusted π€’
Model: iic/emotion2vec_plus_large - Raw Score: 0.000000, Trimmed Score: 0.000000
[ ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.000056, Trimmed Score: 0.000056
[ ] 0.00/100
Label: ζζ§/fearful π¨
Model: iic/emotion2vec_plus_large - Raw Score: 0.000000, Trimmed Score: 0.000000
[ ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.000068, Trimmed Score: 0.000068
[ ] 0.00/100
Label: εΌεΏ/happy π
Model: iic/emotion2vec_plus_large - Raw Score: 0.000000, Trimmed Score: 0.000000
[ ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.000855, Trimmed Score: 0.000855
[ ] 0.00/100
Label: δΈη«/neutral π
Model: iic/emotion2vec_plus_large - Raw Score: 0.999998, Trimmed Score: 0.999998
[ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ] 100.00/10
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.942403, Trimmed Score: 0.942403
[ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ] 94.00/10
which aid to contrast the two models.
One can also write this one liner to check all the usual suspects (.../Emotions_actors/Video/Video_Speech_Actor_03/Actor_03")
from the test sets, via:
for file in /Audio_Speech_Actors_01-24/Actor_01/*.wav; do time python /bojack_emotion_detector.py "$file"; play "$file"; done
or similar.
-> You may add it to some code samples there.
Bug Report: ImportError in hf_datasets_util.py
Description
I encountered an
ImportError
when running the emotion detection model due to the inability to importOfflineModeIsEnabled
fromdatasets.utils.file_utils
. This issue arises in the following file:!/.local/lib/python3.10/site-packages/modelscope/msdatasets/utils/hf_datasets_util.py
Steps to Reproduce
OfflineModeIsEnabled
cannot be imported.Error Message
ImportError: cannot import name 'OfflineModeIsEnabled' from 'datasets.utils.file_utils' text
Change Made
To resolve this issue, I removed the problematic import from
hf_datasets_util.py
. The modified import statement is as follows:This change allows the code to run without encountering the import error.