WeidiXie / VGG-Speaker-Recognition

Utterance-level Aggregation For Speaker Recognition In The Wild
364 stars 97 forks source link

Training under Windows #15

Closed AntonBiryukovUofC closed 5 years ago

AntonBiryukovUofC commented 5 years ago

Hello @WeidiXie ,

Thanks for this awesome work, and sharing it with the open-source community ! I am trying to adapt the code for training on VoxCeleb1 (just because it is a smaller dataset, I decided to play with it first). I have prepared the file lists, plugged in your weights, froze the first layers until the bottleneck in the code, and tried to run main.py . However, for some reason I do get an annoying error prior to training, that is likely related to the fact you're using multiprocessing to speed up data generation:

File "D:\Repos\VGG-Speaker-Recognition\tool\toolkits.py", line 45, in set_mp pool = mp.Pool(processes=processes, initializer=init_worker) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\pool.py", line 175, in __init__ self._repopulate_pool() File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\pool.py", line 236, in _repopulate_pool self._wrap_exception) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\pool.py", line 255, in _repopulate_pool_static w.start() File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__ reduction.dump(process_obj, to_child) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'set_mp.<locals>.init_worker'

I also tried to set the number of processes to 1, and that did not help either. I wonder if you have any suggestions on how to alleviate this.

Thanks again,

Anton.

WeidiXie commented 5 years ago

Hi, Anton

Can you check this ? https://github.com/WeidiXie/VGG-Speaker-Recognition/blob/d6ae3ab9d4af1612572125c63f848aa35d3fee25/src/generator.py#L42

change the loading function from data_generation_mp() to data_generation()

Best, Weidi

AntonBiryukovUofC commented 5 years ago

Would that leave me with only one thread executing the generator pipeline ?

This seems to have worked (although not entirely sure yet, got another problem) : https://stackoverflow.com/questions/8804830/python-multiprocessing-picklingerror-cant-pickle-type-function

I replace multiprocessing as mp with pathos.multiprocessing as mp , and Pool with ProcessPool

WeidiXie commented 5 years ago

hm, the multiprocessing in python is always a problem..... you may consider to use the multiprocess in Keras, that's just simply use __data_generator(), and in the pass an multiprocessing argument in the fit_generator(),

Best, Weidi

WeidiXie commented 5 years ago

Are you using SSD hard drive for data ? because in my case, I used multiple GPUs, it can always reach over 90%.

However, if it's too slow for loading data, it's the librosa, my labmate told me, he can accelerate the training process by 3 times with the FFT in Tensorflow.

Best, Weidi

mmxuan18 commented 5 years ago

hm, the multiprocessing in python is always a problem..... you may consider to use the multiprocess in Keras, that's just simply use __data_generator(), and in the pass an multiprocessing argument in the fit_generator(),

Best, Weidi

i use two gpu and gpu use rate only 20%, maybe the data process take too much time, i find that when generate input , first compute stft and magpharse on the whole wav then random select spec_len, why not first select the spec_len wav and then compute stft and magpharse?

any difference will cause for the final 2d input?
as i statistics the voxceleb2 wav len many wav is as long as 8s or more

WeidiXie commented 5 years ago

I don't think there's any difference for that, it's windowed FFT anyway.

mmxuan18 commented 5 years ago

I don't think there's any difference for that, it's windowed FFT anyway.

if so i think first select spec_len wav then compute stft and magpharse will save a lot of time

WeidiXie commented 5 years ago

That can be a good way, do you want to make the changes ? and I can then merge to the repo.

Best, Weidi

mmxuan18 commented 5 years ago

That can be a good way, do you want to make the changes ? and I can then merge to the repo.

Best, Weidi

i will check idea first and the make a pull request now i meet the same problem as this issue https://github.com/WeidiXie/VGG-Speaker-Recognition/issues/1#issue-413890951, do your model final trained use the wav format?

WeidiXie commented 5 years ago

yes, we trained with wav. well, it might be good to ask @wuqiangch, because he eventually gets it work.

Best, Weidi

mmxuan18 commented 5 years ago

yes, we trained with wav. well, it might be good to ask @wuqiangch, because he eventually gets it work.

Best, Weidi

do you final use wav with 16kHz and 16bit format?