Closed VlaDanilov closed 1 year ago
Same issue here... I am wondering if the get_features() function you have in the README does indeed miss out on the sr
parameter:
signal, fs = librosa.load(file_path)
because this means that the audio is resampled at 22050 Hz as per the documentation or did you use the sr=None
option?
Thanks
After further examination, you indeed left out the sr parameter.
Dear @VlaDanilov and @adrianastan ,
Thank you for your interest in this project and your valuable suggestions. I sincerely apologize for any inconvenience caused to you. We have taken notice of issues regarding the inability to reproduce MFCC features and we have uploaded the code of feature extraction. Moreover, in order to ensure result reproducibility, we have made tests shown in the figure below and found that: i) we reproduced the features using the same speech signals and code on the same device (i.e., device 1), and the extracted features closely match the ones we had provided which just have a difference in some digits after the decimal point; ii) we also reproduced them on different devices (i.e., device 1 and device 2), there are larger discrepancies between them. Specifically, the device 1 is based on arm64 CPU architecture with Darwin 22.6.0 kernel version, and the device 2 is based on x86_64 CPU architecture with Linux 4.15.0-76-generic kernel version. We provided the MFCC feature files utilized in the experiments to ensure reproducibility. You can use these files to avoid variations in feature extraction across different devices. The updated README file provides further details on the experiment specifics. Feel free to refer to it for more information.
I greatly appreciate your feedback, which helps us continually refine and enhance the quality of our work. If you have any further questions or feedback, please don't hesitate to reach out to me. Thank you very much for your understanding and support.
Best regards,
Jiaxin Ye
Good day, thank you very much for sharing your important and strong research, looking forward to see more from you! I have a problem reproducing your feature results. I randomly picked few audio files from EmoDB, extracted features using function that you shared and cannot find extracted feature vectors in the feature vector file that you posted. Ideally, if everything is done right, I should be able to reproduce the feature vectors you obtained, but it's not the case. To search for the same vector I used np.array_equal, np.array_equiv, np.allclose. The shapes are the same, but numbers are different. Maybe you forgot to mention some details about the feature extraction procedure for EmoDB dataset (I used mean_signal_length = 96000 for EmoDB, as you mentioned)? Thank you