Features and model for audio only

RicherMans / text_based_depression

Source code for the paper "Text-based Depression Detection: What Triggers An Alert"

45 stars 10 forks source link

Features and model for audio only #6

Open haijing1995 opened 1 year ago

haijing1995 commented 1 year ago

hello, the audio only results in docs seems great, could you tell me what features do you use and model construction?

RicherMans commented 1 year ago

Hey, I'm sorry which results are you referring to?

haijing1995 commented 1 year ago

Hey, I'm sorry which results are you referring to?

the audio only results on lstm and tcn in /docs/report.md

RicherMans commented 1 year ago

Hey, thanks for noting these results, they were a part of the paper during the development process.

The "HighOrder" features are just the standard mean, median, second order, third order, max, min features extracted from a mel spectrogram.

Btw I don't think these results are all too "good", there were some notable better results we obtained with self-supervised learning such as this paper

haijing1995 commented 1 year ago

Hey, thanks for noting these results, they were a part of the paper during the development process.

The "HighOrder" features are just the standard mean, median, second order, third order, max, min features extracted from a mel spectrogram.

Btw I don't think these results are all too "good", there were some notable better results we obtained with self-supervised learning such as this paper

Thanks for your reply, I have a few more questions,

Each answer of each participant has a different length of time, so the extracted feature（eg. mel-spectrogram） length is also different.
Different participants had different numbers of responses. In order to be able to train in batches, how do you unify these two dimensions(not learning x-feature in the paper you mentioned)?

RicherMans commented 1 year ago

Each answer of each participant has a different length of time, so the extracted feature（eg. mel-spectrogram） length is also different.

We used a batchsize of 1 for training, which did not add any padding.

Different participants had different numbers of responses. In order to be able to train in batches, how do you unify these two dimensions(not learning x-feature in the paper you mentioned)?

We really did train with a batch size of 1 for most papers since the difference as you mention between samples is substantial. However as a note from us, the dataset is very small for common scientific standards, which leads to a very large variance between most experiments, so do not expect to run our experiments a single time and obtain the same result. The random seed on this dataset has a far larger impact than most "optimization" methods.

haijing1995 commented 1 year ago

Thanks a lot for your help, I will try.