run problem with docker?

Xiyor commented 6 years ago

hi, I am interested in audio style transfer. I set up a docker container(run a tensorflow-gpu image) in a host(hardware: 32g memory+1080ti). But, according to your tutorial, I run the command: [python train1.py default] (default is a name I take randomly). it is running without stoping, forever stuck in period of epoch=1 . So , what is wrong with my operation? Looking forward to your answer. Thanks.

andabi commented 6 years ago

@Xiyor I think the queue runner is not working properly. To debug, set queue=False in train1.py and then run and see what message is up.

Xiyor commented 6 years ago

Thank you for your reply. Last cause of problem is: I did not add TIMIT dataset under dir:datasets. when I add TIMIT dataset, I found the filename is strange, has two different extension: phn.txt and phn. So I modify the code in dota_load.py. However, when I set queue=False, the procedure is still running without stop. I am confused what happened.

boussaffawalid commented 6 years ago

I have the same issue on Linux and mac, any updates ?

Xiyor commented 6 years ago

@boussaffawalid I have not make it run successfully. decide to research tensorflow.

andabi commented 6 years ago

@Xiyor @boussaffawalid Please check the paths(data_path or something) in hparam.py again. If you set the paths incorrectly, the problem you mentioned could happen.

Xiyor commented 6 years ago

@andabi @boussaffawalid andabi is right. I checked TIMIT dataset, some wavfiles have no related phn file or some phn files have no related wavfiles. I write a script to find these outlines and run train1.py, it suceessfully run. Thanks.

boussaffawalid commented 6 years ago

I updated the data_path and now I have another issue: Below you can find the log. I tested this with python3 on Mac and Windows.

Traceback (most recent call last): File "train1.py", line 91, in train(logdir=logdir) File "train1.py", line 65, in train summ, gs = sess.run([summ_op, global_step]) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

Xiyor commented 6 years ago

@boussaffawalid did you set queue=false? This method can debugs some error where data preprocess. when data preprocess come with problems, the queue will close. but I am not certain your problem, you can try it. In my case, my TIMIT dataset has some outlines, had similar problems.

Xiyor commented 6 years ago

@andabi still hava problem when switch to tensorflow-gpu docker container. same code could run in tensorflow-cpu container, but failed in gpu container. It hang forever !!. I set queue=false, the problem also occur. Then, I set num_thread=2, add some log code, found a threads exec the step: librosa.load(wav_file, sr=sr) and could not go further, hang here. I could not figure out the problem. could you help me please? Thanks.

HudsonHuang commented 6 years ago

@Xiyor May I ask your setting to dataset path？I set my path as in hparams.py like this: data_path_base = '/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/datasets' logdir_path = '/home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/logdir'

And when I run, I got len of wavfile equals to 0, So I tried print the data path the script search for, and I got: /home/lab-huang.zhongyi/workspace/deep-voice-conversion-master/datasets/timit/TIMIT/TRAIN/*/*/*.wav

And my path is :

Is it a currect path? What was yours? Thank you so much.

Xiyor commented 6 years ago

@HudsonHuang
hello. I feel your path is right. but, I see your TIMIT dataset is strange, your wav file's extension is WAV, not wav, you can modify the code to 'wav', you can try it, wish you. Thanks.

SriramS32 commented 6 years ago

@Xiyor if you are hanging on a librosa.load() maybe you need an audio backend (make sure ffmpeg is installed).

boussaffawalid commented 6 years ago

I updated the data path and added few log messages to make sure that the wav files are loaded correctly. Now it crash on the first epoch with the error bellow, any proposition ?

Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

During handling of the above exception, another exception occurred:

Xiyor commented 6 years ago

@boussaffawalid firstly, the code was written in python3 seemly. and set queue=false is a proper way to verify the preprocess is no bug.

Xiyor commented 6 years ago

@SriramS32 thank you for suggestion. ffmpeg has been installed. guess not this problem.

boussaffawalid commented 6 years ago

@Xiyor Im using python3, I tried with queue=False and I got another error. Is it maybe because of something wrong in the database I'm using! Im using this database http://academictorrents.com/details/34e2b78745138186976cbc27939b1b34d18bd5b3

Traceback (most recent call last): File "train1.py", line 91, in train(logdir=logdir) File "train1.py", line 61, in train mfcc, ppg = get_batch(model.mode, model.batch_size) File "C:\dev\deep-voice-conversion-master\data_load.py", line 276, in get_batch target_wavs = sample(wav_files, batch_size) File "C:\Users\boussafaw\AppData\Local\Programs\Python\Python36\lib\random.py", line 317, in sample raise ValueError("Sample larger than population or is negative") ValueError: Sample larger than population or is negative

Xiyor commented 6 years ago

@boussaffawalid sorry, I just pass a wrong infomation to you, code is written in python2. The error message indicate that len(wav_files) less than batch_size, you could print some log. Directly set queue=false, code seemly run failed, you need to understand the process and modify some code. or you can @andabi .

zuoshaobo commented 6 years ago

how to download the TIMIT datasets?

pmsinner commented 6 years ago

@zuoshaobo TIMIT is not free and the full version costs 250$. You can get it at https://catalog.ldc.upenn.edu/ldc93s1 Or you can borrow it from a friend...

jswilson commented 6 years ago

@boussaffawalid Just in case, and to help anyone who tries this in the future...did you update both the TRAIN and TEST folders in the TIMIT dataset? I only corrected the TRAIN folder, but then I got the error you see because I didn't fix the .WAV to .wav in the TEST folder...hope that helps you, or someone!

boussaffawalid commented 6 years ago

@jswilson @zuoshaobo @Xiyor I did some change, fixes for the issues we discussed above: fixing paths, upgrading to python3, using parameters from command line. I also added a megalink for downloading the database.

In case anyone is interested please check this fork: https://github.com/boussaffawalid/deep-voice-conversion

Hjwjames commented 6 years ago

@boussaffawalid Thank you for your code, I face the problem " raise ValueError("Sample larger than population or is negative")" What it means, should I change batch_size or anything else?

this is my errors,I am glad to waiting for your answer....

target_wavs = sample(wav_files, batch_size) File "G:\anaconda\lib\random.py", line 317, in sample eval1.eval(logdir=logdir, hparams=hparams) File "G:\code\python\myfile\ASR\deep-voice-conversion-master\eval1.py", line 48, in eval mfcc, ppg = get_batch(model.mode, model.batch_size) File "G:\code\python\myfile\ASR\deep-voice-conversion-master\data_load.py", line 203, in get_batch execfile(filename, namespace) File "G:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "G:/code/python/myfile/ASR/deep-voice-conversion-master/train1.py", line 117, in train(logdir=logdir, hparams = hp) File "G:/code/python/myfile/ASR/deep-voice-conversion-master/train1.py", line 77, in train Traceback (most recent call last): File "", line 1, in File "G:\anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile

raise ValueError("Sample larger than population or is negative")

ValueError: Sample larger than population or is negative

tiankong-hut commented 6 years ago

@boussaffawalid I meet the same problem , did you resolve it? Thank you.

2018-05-17 15:57:18.150601: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150617: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150620: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150623: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. 2018-05-17 15:57:18.150625: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations. 0%| | 0/11 [00:00<?, ?b/s]Exception KeyError: KeyError(<weakref at 0x7f1226913260; to 'tqdm' at 0x7f1226afb0d0>,) in <bound method tqdm.del of 0%| | 0/11 [00:01<?, ?b/s]> ignored Traceback (most recent call last): File "/home/human-machine/Speech/deep-voice-conversion-master/train1.py", line 90, in train(logdir='./logdir/default/train1', queue=True) File "/home/human-machine/Speech/deep-voice-conversion-master/train1.py", line 57, in train sess.run(train_op) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0) [[Node: batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/padding_fifo_queue, batch/n)]]

tiankong-hut commented 6 years ago

I have resolved my problem about "OutOfRangeError: PaddingFIFOQueue '_1_batch/padding_fifo_queue' is closed and has insufficient elements (requested 32, current size 0)" , just install ffmpeg (A complete, cross-platform solution to record, convert and stream audio and video).

andabi / deep-voice-conversion

run problem with docker? #7