CUC-Acoustics / Auditory-Saliency-Models

1 stars 0 forks source link

Question - Default frequency bands/centers for Kalinli and Duangudom models? #2

Open k4vi opened 10 months ago

k4vi commented 10 months ago

Hello! Sorry for asking about such an old project. I wanted to confirm the frequency values that are associated with the saliency maps for Kalinli and Duangudom.

In Kalinli_Saliency.m, the parameters for the Cochlear filtering are listed as 128 channels from 100hz to 8000hz:

% Cochlear filtering 
%
% In the "scm" input argument, the parameters are as follows
% 128: the number of channels 
% 100: the lowest frequency
% 8000: the highest frequency 
% "5": time resolution is 5 milliseconds 
% "0": without high pass filter in cochlear output
[eResp,fx,cf,tx] = scm(s(:,1),fs,[128 100 8000],5, 0);

After the center surround interactions are computed, the saliency data structures' shapes are reduced from 128 to 64.

Does this means that the output map is showing saliency per frequency band from 100hz to 8000hz at 64 evenly spaced intervals? I ask only because the part of the code that draws the figures uses a linspace that starts at 0 and ends depending on the sampling rate of the sound:

% Draw the saliency map.
figure(1);
ndx = linspace(0,(length(s)/fs)*1000,size(SAL,2));
mdx = linspace(0,floor(fs/2), size(SAL,1));
imagesc(ndx,mdx,SAL);
title('Saliency map');
xlabel('Time/ms')
ylabel('Frequency/Hz')

Can the mdx listed here be ignored? Is it safe to assume that regardless of the sampling rate used (as long as it is over 16khz), the Kalinli saliency maps are for 64 frequency band centers from 100hz to 8000hz?

Similarly, for Duangudom_Saliency.m, mdx is defined in terms of sampling rate:

% Draw the saliency map.
[M,N]=size(Saliency);
ndx = (1:N) * paras(1);
mdx = (1:M) * fs / 2 / M;
imagesc(ndx,mdx,Saliency);
title('Saliency map');
xlabel('Time/ms');
ylabel('Frequency/Hz')

Is it safe to ignore this range and assume that the default parameters are for 128 frequency band centers also from 100hz to 8000hz?

(I know the sample .wav files were sampled at 16khz so the math works out for them, but my audio files happen to have a higher sampling rate that I didn't want to resample/lose resolution)

xiongxiansong commented 9 months ago

Hello! Sorry for asking about such an old project. I wanted to confirm the frequency values that are associated with the saliency maps for Kalinli and Duangudom.

In Kalinli_Saliency.m, the parameters for the Cochlear filtering are listed as 128 channels from 100hz to 8000hz:

% Cochlear filtering 
%
% In the "scm" input argument, the parameters are as follows
% 128: the number of channels 
% 100: the lowest frequency
% 8000: the highest frequency 
% "5": time resolution is 5 milliseconds 
% "0": without high pass filter in cochlear output
[eResp,fx,cf,tx] = scm(s(:,1),fs,[128 100 8000],5, 0);

After the center surround interactions are computed, the saliency data structures' shapes are reduced from 128 to 64.

Does this means that the output map is showing saliency per frequency band from 100hz to 8000hz at 64 evenly spaced intervals? I ask only because the part of the code that draws the figures uses a linspace that starts at 0 and ends depending on the sampling rate of the sound:

% Draw the saliency map.
figure(1);
ndx = linspace(0,(length(s)/fs)*1000,size(SAL,2));
mdx = linspace(0,floor(fs/2), size(SAL,1));
imagesc(ndx,mdx,SAL);
title('Saliency map');
xlabel('Time/ms')
ylabel('Frequency/Hz')

Can the mdx listed here be ignored? Is it safe to assume that regardless of the sampling rate used (as long as it is over 16khz), the Kalinli saliency maps are for 64 frequency band centers from 100hz to 8000hz?

Similarly, for Duangudom_Saliency.m, mdx is defined in terms of sampling rate:

% Draw the saliency map.
[M,N]=size(Saliency);
ndx = (1:N) * paras(1);
mdx = (1:M) * fs / 2 / M;
imagesc(ndx,mdx,Saliency);
title('Saliency map');
xlabel('Time/ms');
ylabel('Frequency/Hz')

Is it safe to ignore this range and assume that the default parameters are for 128 frequency band centers also from 100hz to 8000hz?

(I know the sample .wav files were sampled at 16khz so the math works out for them, but my audio files happen to have a higher sampling rate that I didn't want to resample/lose resolution)

Firstly, the ndx setting in the code is incorrect; it should be [100Hz, 8000Hz], and the center frequencies are not uniformly distributed. You can refer to the scm.m function in the folder. Similar issues exist in the cases of Kalinli and Duangudom.

Secondly, for situations with a sampling frequency higher than 16kHz, if your sampling frequency is fs, you can modify the code to: scm(s(:,1), fs, [128 100 fs/2], 5, 0) Other parameters in the code can be adjusted according to your requirements.