Closed AntonBiryukovUofC closed 5 years ago
Hi, Anton
Can you check this ? https://github.com/WeidiXie/VGG-Speaker-Recognition/blob/d6ae3ab9d4af1612572125c63f848aa35d3fee25/src/generator.py#L42
change the loading function from data_generation_mp() to data_generation()
Best, Weidi
Would that leave me with only one thread executing the generator pipeline ?
This seems to have worked (although not entirely sure yet, got another problem) : https://stackoverflow.com/questions/8804830/python-multiprocessing-picklingerror-cant-pickle-type-function
I replace multiprocessing as mp
with pathos.multiprocessing as mp
, and Pool
with ProcessPool
hm, the multiprocessing in python is always a problem..... you may consider to use the multiprocess in Keras, that's just simply use __data_generator(), and in the pass an multiprocessing argument in the fit_generator(),
Best, Weidi
Are you using SSD hard drive for data ? because in my case, I used multiple GPUs, it can always reach over 90%.
However, if it's too slow for loading data, it's the librosa, my labmate told me, he can accelerate the training process by 3 times with the FFT in Tensorflow.
Best, Weidi
hm, the multiprocessing in python is always a problem..... you may consider to use the multiprocess in Keras, that's just simply use __data_generator(), and in the pass an multiprocessing argument in the fit_generator(),
Best, Weidi
i use two gpu and gpu use rate only 20%, maybe the data process take too much time, i find that when generate input , first compute stft and magpharse on the whole wav then random select spec_len, why not first select the spec_len wav and then compute stft and magpharse?
any difference will cause for the final 2d input?
as i statistics the voxceleb2 wav len many wav is as long as 8s or more
I don't think there's any difference for that, it's windowed FFT anyway.
I don't think there's any difference for that, it's windowed FFT anyway.
if so i think first select spec_len wav then compute stft and magpharse will save a lot of time
That can be a good way, do you want to make the changes ? and I can then merge to the repo.
Best, Weidi
That can be a good way, do you want to make the changes ? and I can then merge to the repo.
Best, Weidi
i will check idea first and the make a pull request now i meet the same problem as this issue https://github.com/WeidiXie/VGG-Speaker-Recognition/issues/1#issue-413890951, do your model final trained use the wav format?
yes, we trained with wav. well, it might be good to ask @wuqiangch, because he eventually gets it work.
Best, Weidi
yes, we trained with wav. well, it might be good to ask @wuqiangch, because he eventually gets it work.
Best, Weidi
do you final use wav with 16kHz and 16bit format?
Hello @WeidiXie ,
Thanks for this awesome work, and sharing it with the open-source community ! I am trying to adapt the code for training on VoxCeleb1 (just because it is a smaller dataset, I decided to play with it first). I have prepared the file lists, plugged in your weights, froze the first layers until the bottleneck in the code, and tried to run
main.py
. However, for some reason I do get an annoying error prior to training, that is likely related to the fact you're using multiprocessing to speed up data generation:File "D:\Repos\VGG-Speaker-Recognition\tool\toolkits.py", line 45, in set_mp pool = mp.Pool(processes=processes, initializer=init_worker) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\context.py", line 119, in Pool context=self.get_context()) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\pool.py", line 175, in __init__ self._repopulate_pool() File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\pool.py", line 236, in _repopulate_pool self._wrap_exception) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\pool.py", line 255, in _repopulate_pool_static w.start() File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\context.py", line 322, in _Popen return Popen(process_obj) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__ reduction.dump(process_obj, to_child) File "C:\Users\abiryukov\AppData\Local\Continuum\anaconda3\envs\pyBK\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'set_mp.<locals>.init_worker'
I also tried to set the number of processes to 1, and that did not help either. I wonder if you have any suggestions on how to alleviate this.
Thanks again,
Anton.