HoerTech-gGmbH / openMHA

The open Master Hearing Aid (openMHA)
http://www.openmha.org
GNU Affero General Public License v3.0
252 stars 74 forks source link

max_lag calculation in example 09 (localizer-steering-beamformer) #31

Closed yuan9778 closed 5 years ago

yuan9778 commented 5 years ago

Dear Hendrik,

Regarding the mvdr filter stored in "MVDR_iso_norm_bte_16KHz_4ch_lrFFT512-180-5-180.txt". You mentioned it was generated by using Matlab scripts. I guess it is mcriophone-array dependent. Since you will release the Matlab script for mvdr filter in the next github update, is it possible that you can send me a copy in advance if you already have it in hand? I want to try it on a two mics array.

When I proceed to 09-example (localizer-steering-beamformer) I have difficulties in understanding the following 2 steps:

  1. For the class doasvm_feature_extraction, there is a member "max_lag" (Maximum lag in samples between microphones), you initiate it with value 2 or 20. I guess 2 is for the pair of mics with the distance of 0.0149m. This is correct for sampling rate of 48k but in this example it did a resampling and the sampling rate was changed to 16k. Below is the calculation: (0.0149 / 343) * 48000 = 2.085 samples.
    For a sampleing rate of 16000, it will be 0.695 sample.

  2. For the class doasvm_classification, it needs the data from file "matrices_4channel_front-rear_bte.cfg" to initiate its member w, b, x and y. I read your paper "A discriminative learning approach to probabilistic acoustic source localization" and got some ideas about how to generate those data, but still not certain. Did you record sound from all 73 directions and use them as input files for doasvm_feature_extraction to generate training data? Which liblinear command you used for training? Where did you get the data for x and y? Maybe I understand it all wrong. It will be great help if you can shed some light on this.

Thanks again for your time and the openMHA project from which I learn a lot.

Regards, Sanqing

hendrikkayser commented 5 years ago

Hi Sanqing, yes, the filter coefficients are dependent on the microphone array - geometry and transfer characteristics of the microphones itself. I currently don't have proper scripts at hand that I can provide and it's vacation time, so I'm sorry that it may take a while, but I'll provide something as soon as I have it.

Re 1.: "max_lag" refers to the maximum delay between the different channels that is taken into account in the feature extraction. I took a little more than the theoretical physical lag that can occur between different microphones to make sure that all directional information contained in the signal is taken into account, e.g., also early reflections from the structure around the microphones such as the pinnae. In general, choosing max_lag a little longer does not hurt, but having it too short may drop important information.

Re 2.: "matrices_4channel_front-rear_bte.cfg" contains the model that was trained in Matlab and converted afterwards to be compatible with the openMHA framework. The plugin 'doasvm_feature_extraction' is only conducting the feature extraction for the localizer. It cannot be used to train a new SVM model. The training of the model is not part of openMHA and was done under Matlab. Liblinear was used to train SVM models using features extracted from sound signals arriving from all directions to be taken into account:

train(Labels, Features, '-s 1 -c 0.01 -e 0.01 -B 1')

The sound files were generated with the HRIR database mentioned before (speech signals convolved with HRIRs). x and y are parameters of a sigmoid function that was fitted under Matlab using the glmfit function:

glmfit(SVMconfidenceValues,Labels,'binomial', 'link', 'logit');

where 'SVMconfidenceValues' are derived from traindata.

I'm aware that this is not a comprehensive recipe to design completely new localizer setup - which would be beyond the scope of openMHA for the time being -, but I hope this helps you a little bit!

Best, Hendrik