Strong-AI-Lab / emotion

Emotion Recognition ToolKit (ERTK): tools for emotion recognition. Dataset processing, feature extraction, experiments,
MIT License
55 stars 11 forks source link

Sentiment Labels for CMU-MOSEI transcripts #6

Closed omkar-kumbhar closed 2 years ago

omkar-kumbhar commented 2 years ago

I wanted to get the sentiment labels for the Raw transcripts. For example, A transcript file inside CMU_MOSEI at: /Raw/Transcript/Segmented/Combined/_0efYOjQYRc.txt looks like this:

_0efYOjQYRc___0___0.0___8.025___Deloitte announced that Frank Vettese will be taking over as the managing partner and CEO of Deloitte Canada.
_0efYOjQYRc___1___7.814___13.882___Vettese has served as the managing partner of Deloitte's Financial Advisory Division for the past seven years.
_0efYOjQYRc___2___13.531___20.756___Vettese has been involved in mergers and acquisitions, valuations and analytics for years before joining the ranks of Deloitte.
_0efYOjQYRc___3___20.215___30.424___He is the co-founder of Rossen and Vettese Limited and the former Executive Director of Uniform Final Examination (UFE) courses at Toronto's York University.
_0efYOjQYRc___4___30.342___34.565___Chairman Glenn Ives said that Deloitte is very happy with the new appointment.
_0efYOjQYRc___5___34.124___37.518___For the complete article, please go to Big4.com 

where if you split them with '____', you should get video id, and split id, which I've assumed to be the same as intervals.

In order to tag it I used the process method but I'm unsure about it's correctness. The sentiment labels in the feature column goes from -3 to +3, with an increment of 0.33. Can you see if the code snippet I've used would work?

The code I used was from: https://github.com/Strong-AI-Lab/emotion/blob/21c02f1e8ce96796cf3e9281e8aa0461fe3c7479/datasets/CMU-MOSEI/process.py#L41-L70

I modified the process method, to get sentiment.

def process(name: str):
    dataset = h5py.File('cmumosei/Raw/CMU_MOSEI_Labels.csd', "r")
    features = dataset[f"All Labels/data/{name}/features"]
    intervals = np.array(dataset[f"All Labels/data/{name}/intervals"])
    intervals = (16000 * intervals).astype(int)
    sentiments = []
    for i in range(len(intervals)):
        newname = f"{name}_{i:02d}"
        sentiments.append((newname, features[i][0]))
    return sentiments 
agkphysics commented 2 years ago

Yeah I need to redo the CMU-MOSEI processing script because the total number of clips in the label info is different to the total number of clips overall (see below). The clips are also numbered in the transcripts but not the labels file, such that process.py doesn't match them properly.

I'll rewrite process.py to use the pre-segmented video clips instead and match these with the appropriate interval from CMU_MOSEI_Labels.csd.


Video stats

Full videos: 3837 Full videos in CMU_MOSEI_Labels.csd: 3293 Full videos in predefined train/valid/test folds: 2769 (2 of which do not have labels)

Existing video segments: 39627 Segment intervals in transcripts: 44977 Segment intervals in CMU_MOSEI_Labels.csd: 23259

gt950 commented 1 year ago

@agkphysics Hello, your work is excellent, but I have a little doubt, using the changed processing script you provided, slicing the audio according to the transcription and then labelling it with emotion labels, the final total is 30,174, but the official presentation has a total of 23,453, I'm a little confused, I hope you can help me, thanks in advance!

agkphysics commented 1 year ago

@gt950 I think there are some discrepancies between the labels file and their presentation. As I mentioned, there are only 23259 intervals in CMU_MOSEI_Labels.csd. I don't know where the other 200 come from. Also I'm not sure why you get 30,174 files with labels.

gt950 commented 1 year ago

@agkphysics Thank you very much for your answer! Sorry, it was a mistake on my part, because you redesigned {processing.py} according to the slice labels in the transcription and got 30174 slices in the {label.csv} file with the new script you provided, but the number of slices with labels in it is still 23259. Also, I found that in the emotion labels of ['features'] there would be several types of emotion with the same evaluation value, which caused the agrmax() function to return only the index of the first maximum value, thus resulting in audio slices with emotions of 'happy', 'sad ', 'anger', 'disgust', 'surprise', ' fear' are 14567, 3783, 2730, 1291, 437, 452 respectively, which also differs from the official distribution of the number of labels provided, is this due to this issue in {CMU_MOSEI_labels.csd}? I wonder if you have noticed this issue. Thank you again!

agkphysics commented 1 year ago

@gt950 No this is deliberate. I wanted to get a single emotion label for the purpose of multiclass emotion classification. The code should give two labels files, label_maj.csv for values that are a majority of the total, and label_plu.csv for values that are only a plurality.

Although I just found a bug in which it was doing integer division and incorrectly assigning the majority label. Fixed in 78639c6

gt950 commented 1 year ago

@agkphysics Thank you so much for your reply and help!!