About the audio_feature_extractor.feature_segmenter()

adbailey1 / DepAudioNet_reproduction

Reproduction of DepAudioNet by Ma et al. {DepAudioNet: An Efficient Deep Model for Audio based Depression Classification,(https://dl.acm.org/doi/10.1145/2988257.2988267), AVEC 2016}

64 stars 14 forks source link

About the audio_feature_extractor.feature_segmenter() #10

Closed BaiMeiyingxue closed 1 year ago

BaiMeiyingxue commented 1 year ago

Hi Andrew, Thank you for sharing the code. It helped in my research a lot!

I have some questions:

I was a little confused about the method of feature_segmenter()audio_feature_extractor.py, when calculating the param of num_extra_dimensions didn't use overlap, but in the method of segmenter_test() in {run_tests_afe()}unit_test_audio_feature_extractor.py using the overlap to calculate expected_dimensions , I didn't find the method of logmel_segmenter(). When I change the logmel_segmenter() to feature_segmenter() and run unit_test_audio_feature_extractor.py, Test Failed occurred. I was wondering if you used overlap to segment the feature.
I am happy to find the method you used to increase mel feature, e.g. when the shape of the feature is (40,39063), it is padded to (40,39064), segmented to (38,40,1028) when num_extra_dimensions is 38. So I have another 38 training data, am I right?

Thanks for your reply!

adbailey1 commented 1 year ago

Hey, sorry for the late reply, I am not managing this code any more but I am glad it has helped you in your research!

1 - In answer to your first question: good spot, I think as the code developed over time, I removed the overlap function from the audio_feature_extractor.py file but it remained in the unit tests. I wouldn't worry too much about the unit tests, they were mainly used for debugging purposes, if you change line 166 in unit_test_audio_feature_extractor.py to [0, 0, 0, 0] (setting the overlap to constant 0) this should pass the test though.

2 - Yes you are correct. And if you tweak my code to add overlap, you would of course further increase the amount of training data.

Hope this makes sense :)

BaiMeiyingxue commented 1 year ago

Hi Andrew, Thanks for your advice! I found when using overlap to segment features

    if log_spec_test.shape[1] % dimensions == 0 and overlap == 0:
        expected_dimensions = log_spec_test.shape[1] // dimensions
    elif log_spec_test.shape[1] % dimensions == 0 and overlap > 0:
        hop = int((overlap / 100) * dimensions)
        expected_dimensions = ((log_spec_test.shape[1] - dimensions) // hop) + 1
    elif log_spec_test.shape[1] % dimensions != 0 and overlap == 0:
        expected_dimensions = (log_spec_test.shape[1] // dimensions) + 1
    else:
        hop = int((overlap / 100) * dimensions)
        expected_dimensions = ((log_spec_test.shape[1] - dimensions) // hop) + 2

it will occur the error ValueError: array split does not result in an equal division when running new_features[:, :, :] = np.split(feature, expected_dimensions, axis=1) The problem is hard to solve, can you give me some suggestions?

adbailey1 commented 1 year ago

Hi,

No problem.

With the new problem, I don't know exactly how to help as I don't know the numbers and where in the code this is. However, I think you will need to change the code to split the data through a 'for' loop if you want to use overlap (at least for now to work out why the error is happening).

As a toy example: if your data is [40, 150] and you want to split the temporal dimension to 50 with overlap=0, expected_dimensions = 150 // 50 => 3. This means the new array you try to fit to should be [3, 40, 50].

However, if overlap = 50 (50%) hop = int((50 / 100) * 50) => 25 expected_dimensions = ((150 - 50) // 25) + 1 => 4. This means the new array should be [4, 40, 50]

Your loop will look something like this in pseudo-code

loc = 0
dim = 50
old_array = np.random(40, 150)
new_array = np.zeros((4, 40, dim))
for i in range(4):
    new_array[i, :, :] = old_array[:, i*loc:i*loc + dim]
    loc += 25

BaiMeiyingxue commented 1 year ago

Hi Andrew, It makes sense to me! About the code, I wondered whether loc should be constant as below.

loc = 25
dim = 50
old_array = np.random(40, 150)
new_array = np.zeros((4, 40, dim))
for i in range(4):
    new_array[i, :, :] = old_array[:, i*loc:i*loc + dim]

As far as I'm concerned, using the parameters overlap is always more helpful because I can get more training data. Are there any advantages if not using the parameters overlap?

adbailey1 commented 1 year ago

I am so sorry I forgot to reply!

Yes you are right, either approach is fine.

That's right, you do obtain more training data when using overlap but the things to be aware of (especially with this dataset) are overfitting and also quality of data. The DAIC-WOZ is very challenging and is small so these are things to consider. But if you use overlap, your model does has the ability to better understand the context of an audio file, you could see where the 'sweet-spot' is for this threshold by experimenting with 25%, 50%, and 75% for example.

BaiMeiyingxue commented 1 year ago

I got it, thank you!