audeering / opensmile

The Munich Open-Source Large-Scale Multimedia Feature Extractor
https://audeering.github.io/opensmile/
Other
553 stars 74 forks source link

Reading features from output file #58

Open yugen-ok opened 1 year ago

yugen-ok commented 1 year ago

Hello all,

I am trying to extract features using the config file config/is09-13/IS13_ComParE.conf. I applied it on the opensmile.wav example.

My issue is that the resulting csv has instead of values for features, only types (that is, each feature's "value" is its type), which is either string (for the first feature, the filename), or a numeral (for all other features. And it seems that the feature values are all stuck in the last 2 rows of the csv.

Maybe I misunderstand something. Thanks!

The command I used to generate this is:

build/progsrc/smilextract/SMILExtract -C config//is09-13/IS13_ComParE.conf -I example-audio/opensmile.wav -O opensmile.energy.csv -instname example-audio/opensmile.wav

The file is attached below.

opensmile.energy.csv

chausner-audeering commented 1 year ago

The command you use generates ARFF files (for use with the Weka toolkit), not CSV files. To get CSV output, you should use -csvoutput instead of -O. You can see this by inspecting https://github.com/audeering/opensmile/blob/master/config/shared/standard_data_output.conf.inc where the command-line parameters are defined.

yugen-ok commented 1 year ago

Thanks! I did that, and now I got a csv format, but there is only one row of data with the value of "frameTime" being 0.0. I tried it with and without -instname, and on both opensmile.wav and media-interpretation.wav, both with the same result.

Also, some of the features have a number in brackets, either between 0-14 or between 0-25, and I'm not sure what it means.

I read the user guide but I still don't get what I'm doing wrong. Any help would be appreciated.

chausner-audeering commented 1 year ago

It sounds like you would like to extract LLD (framewise low-level descriptor) features instead of functional features. LLD features are extracted for low-level frames so you would get a feature vector for every x ms. Functional features are aggregates computed on top of LLD features and by default, aggregated over the full input length so you end up with a single feature vector. This is what is output by default. If you want to get the LLD output, use -lldcsvoutput instead of -csvoutput.

chausner-audeering commented 1 year ago

Also, some of the features have a number in brackets, either between 0-14 or between 0-25, and I'm not sure what it means.

Those are features that have array values, i.e. more than one value. The number indicates the index.