ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
637 stars 47 forks source link

ImportError in hf_datasets_util.py #51

Open Manamama opened 5 days ago

Manamama commented 5 days ago

Bug Report: ImportError in hf_datasets_util.py

Description

I encountered an ImportError when running the emotion detection model due to the inability to import OfflineModeIsEnabled from datasets.utils.file_utils. This issue arises in the following file:

!/.local/lib/python3.10/site-packages/modelscope/msdatasets/utils/hf_datasets_util.py

Steps to Reproduce

  1. Attempt to run the emotion detection model using ModelScope.
  2. Observe the error message indicating that OfflineModeIsEnabled cannot be imported.

Error Message

ImportError: cannot import name 'OfflineModeIsEnabled' from 'datasets.utils.file_utils' text

Change Made

To resolve this issue, I removed the problematic import from hf_datasets_util.py. The modified import statement is as follows:

from datasets.utils.file_utils import (
    _raise_if_offline_mode_is_enabled,
    cached_path, is_local_path,
    is_relative_path,
    relative_to_absolute_path)

This change allows the code to run without encountering the import error.

ddlBoJack commented 4 days ago

Please try the funasr pipeline.

Manamama commented 4 days ago

Yeah, that (second) one works out of the box. The first sample also works, but needs that fix.

Manamama commented 4 days ago

BTW, I have tested both models over 24 hours.

This code may come in handy for some:

import argparse
from funasr import AutoModel

def visualize_emotion_scores(results):
    # Define a mapping of emotion labels to emoticons
    emotion_emoticons = {
        'η”Ÿζ°”/angry': '😠',
        '厌恢/disgusted': '🀒',
        '恐惧/fearful': '😨',
        'εΌ€εΏƒ/happy': '😊',
        'δΈ­η«‹/neutral': '😐',
        'ε…Άδ»–/other': 'πŸ€·β€β™‚οΈ',
        'ιšΎθΏ‡/sad': '😒',
        'εƒζƒŠ/surprised': '😲',
        '<unk>': '❓'
    }

    print("\nEmotion Scores Visualization, version 3.1:")

    # Assuming both models have the same labels
    num_labels = len(results[0]['labels'])  # Get number of labels from first model

    # Iterate over labels
    for i in range(num_labels):
        label = results[0]['labels'][i]  # Get the label from the first model
        emoticon = emotion_emoticons.get(label, '')  # Get emoticon for the label

        print(f"\nLabel: {label} {emoticon}")

        # Now iterate over each model to get scores for this label
        for result in results:
            model_name = result['model_name']
            score = result['scores'][i]
            trimmed_score = score  # Use the score directly

            # Scale to a range  for visualization, modify the multiplier to make it longer:
            scaled_score = int(round(trimmed_score * 100))  # Convert to integer for bar representation

            # Create the bar visualization
            bar = 'β–ˆ' * scaled_score  # Fill remaining with spaces

            # Print model name, score details
            print(f"Model: {model_name} - Raw Score: {score:.6f}, Trimmed Score: {trimmed_score:.6f}")
            print(f"[{bar:<100}] {scaled_score:.2f}/100")

# Set up argument parsing
parser = argparse.ArgumentParser(description="Emotion recognition from audio files.")
parser.add_argument("audio_file", type=str, help="Path to the audio file")
args = parser.parse_args()

# Load models and generate results for each audio file
results = []  # Initialize an empty list to store results
# List of models to use
model_names = [
    "iic/emotion2vec_plus_large",
    "iic/emotion2vec_base_finetuned"
]

for model_name in model_names:
    print("Model_name:", model_name)

    # Load the emotion recognition model
    model = AutoModel(model=model_name)

    # Use the provided audio file for inference
    rec_result = model.generate(args.audio_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)

    # Add the model name to the results
    rec_result_with_name = {
        'model_name': model_name,
        **rec_result[0]  # Assuming rec_result is a list and we want to merge its first dictionary
    }

    # Append modified results to the results list
    results.append(rec_result_with_name)

# Print out the modified results to check
for result in results:
    print(result)

visualize_emotion_scores(results)    

It produces:

Emotion Scores Visualization:

Label: η”Ÿζ°”/angry 😠
Model: iic/emotion2vec_plus_large - Raw Score: 0.000001, Trimmed Score: 0.000001
[                                                                                                    ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.056474, Trimmed Score: 0.056474
[β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                              ] 6.00/100

Label: 厌恢/disgusted 🀒
Model: iic/emotion2vec_plus_large - Raw Score: 0.000000, Trimmed Score: 0.000000
[                                                                                                    ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.000056, Trimmed Score: 0.000056
[                                                                                                    ] 0.00/100

Label: 恐惧/fearful 😨
Model: iic/emotion2vec_plus_large - Raw Score: 0.000000, Trimmed Score: 0.000000
[                                                                                                    ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.000068, Trimmed Score: 0.000068
[                                                                                                    ] 0.00/100

Label: εΌ€εΏƒ/happy 😊
Model: iic/emotion2vec_plus_large - Raw Score: 0.000000, Trimmed Score: 0.000000
[                                                                                                    ] 0.00/100
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.000855, Trimmed Score: 0.000855
[                                                                                                    ] 0.00/100

Label: δΈ­η«‹/neutral 😐
Model: iic/emotion2vec_plus_large - Raw Score: 0.999998, Trimmed Score: 0.999998
[β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ] 100.00/10
Model: iic/emotion2vec_base_finetuned - Raw Score: 0.942403, Trimmed Score: 0.942403
[β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ      ] 94.00/10

which aid to contrast the two models.

One can also write this one liner to check all the usual suspects (.../Emotions_actors/Video/Video_Speech_Actor_03/Actor_03") from the test sets, via: for file in /Audio_Speech_Actors_01-24/Actor_01/*.wav; do time python /bojack_emotion_detector.py "$file"; play "$file"; done or similar.

-> You may add it to some code samples there.