Open flyuuo9 opened 6 years ago
Excuse me,do you solve this problem? I have the same problem. Traceback (most recent call last): File "C:\Users\blcdec\ApplicationInstallPlace\Anaconda3\envs\tensorflow\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "C:\Users\blcdec\ApplicationInstallPlace\Anaconda3\envs\tensorflow\lib\site-packages\tensorpack\dataflow\parallel.py", line 162, in run for dp in self.ds.get_data(): File "C:\Users\blcdec\ApplicationInstallPlace\Anaconda3\envs\tensorflow\lib\site-packages\tensorpack\dataflow\common.py", line 116, in get_data for data in self.ds.get_data(): File "C:\Users\blcdec\project\deep-voice-conversion-master\deep-voice-conversion-master\data_load.py", line 35, in get_data yield get_mfccs_and_phones(wav_file=wav_file) File "C:\Users\blcdec\project\deep-voice-conversion-master\deep-voice-conversion-master\data_load.py", line 72, in get_mfccs_and_phones wav = read_wav(wav_file, sr=hp.default.sr) KeyError: 'default'
@bhui I'm having the same problem on Windows 10.
ValueError: Op type not registered 'NcclAllReduce' in binary running on DESK. Make sure the Op and Kernel are registered in the binary running in this process. Note that if you are loading a saved graph which used ops from tf.contrib, accessing (e.g.) `tf.contrib.resampler` should be done before importing the graph, as contrib ops are lazily registered when the module is first accessed. while building NodeDef 'AllReduceGrads/NcclAllReduce'
I've now tried at least 8 different repos for trying to learn voice cloning, and none of them have good enough documentation for me to get them working. I'm super inspired by all of the examples but haven't had much luck.
Same problem here. @bhui Have you solved this problem? Or could anyone please provide a solution? Thanks a lot!
I have been unable to find a solution, but after thorough troubleshooting I have found the problem. The project relies on nccl, which is not supported in Windows. I don't know enough of Python or Tensorflow (new to both) to know how to edit the code and exclude calls to nccl, so my "solution" was to dual-boot linux(Ubuntu) on my system.
Unfortunately, it seems that unless nVidia releases nccl for Windows or major changes are made to the code for this project, it can only be run on a nccl-supported linux system.
I'm running on Windows on a single GPU, you should migrate all the code that uses hparam.py, I changed all the code to use hparams.py, in most of the code you just have to change from default to Default, there is missing properties in Default and TrainX in hparams.py so, copy and paste the properties from hparam.py and replace the : for =
Nccl reduce may be caused by leaking wav files or the dataset path is incorrect, verify in the hparams.py, the other cause of ncclreduce is to use more than 1 GPU on windows.
My hparams.py, hope it helps. hparams.zip
Well I tried on Windows 7 with your hparams and seems like it works, but now I'm getting a bunch of encoding errors, and It seems like it's not finding the dataset properly. (it's also saving to the wrong logdir.) Did you have this problem? if so, how did you fix it?
Yes, as you said this error is related to the path of the dataset, I'm new to python and TF so I created a litle project using glob.glob to try load wav files, I discover that I have to delete the / at the start of the path. Here is one example of my path. pythontest.zip
With the logdir I have to stop using the case, now I'm fixing that to use the case names from the console params.
@CIDFarwin you got output with python 3.5 on windows?
I'm still getting the same errors. I've used your test script and I'm finding the files, and I'm using the same datapath, so I don't know what's going on. I'm wondering if it is some sort of encoding problem, but I don't know why nobody else seems to have that problem (and it works on my linux build)
You installed https://www.ffmpeg.org/? Let me know. I'm using the latest version but looks like linux somewhere is generating different arrays 😞
@CIDFarwin I cloned again the master branch and I was getting this error, try by reaplacing data_load.py on line 81 for
phn_file = wav_file.replace("WAV", "PHN").replace("wav", "PHN")
I hope you fix it.
Well, finally woking with Python 3.6 on Windows
Huh,
I'm sure I tried that already, but that seems to have fixed it. I'll let it run for a bit and let you know how my output looks.
Thanks a bunch!
@carlfm01 I'm very encouraged but still very confused about your comment about hparams.py. https://github.com/andabi/deep-voice-conversion/issues/52#issuecomment-420484728
I'm on Windows 10.
I see these files (among others, of course):
I see that you shared a file called hparams.py, but I'm not sure where to save it.
If you wouldn't mind, I'd love if you could clarify each step that you wrote here:
you should migrate all the code that uses hparam.py, I changed all the code to use hparams.py, in most of the code you just have to change from default to Default, there is missing properties in Default and TrainX in hparams.py so, copy and paste the properties from hparam.py and replace the : for =
Here is my guess about what you were saying:
hparams.py
file to your project folder.cc @CIDFarwin
Thanks!
@ryancwalsh Take a look https://github.com/carlfm01/deep-voice-conversion, also I changed the code of the lambda in convert to make it work on python 3.+, let me know if you still don't undertand something.
I can confirm that it is working on windows. I have an ouptut, but not very good.
I have another problem now, as it seems tensorflow is only running on my CPU and not my GPU, and training is taking a very long time. (much longer than on my linux build.) I know the scripts are finding my GPU because I see the line "Created TensorFlow device" and my GPU listed, with compute capabliity 5.2
Hi @CIDFarwin You can see the GPU usage using the nvidia-smi C:\Program Files\NVIDIA Corporation\NVSMI, check the usage with this tools and let me know, also check if in the params you set gpu to 1, I'm with the same problem on Linux, on my Windows machine the GPU always is on 40%-99% usage, but in Linux is always 0%-15%, in the Linux build you have to use allow_soft_placement? For me in Linux just creating the session and creating the session takes like 5min or more, (same hardware). I tried a lot of config changes but no luck yet.
@carlfm01
@ryancwalsh Take a look https://github.com/carlfm01/deep-voice-conversion, also I changed the code of the lambda in convert to make it work on python 3.+, let me know if you still don't undertand something.
I run train1.py use the code, but train2.py always raised MemoryError. My machine has 16GB memory. The memory usage is below 90% in the monitor view.
[32m[1114 00:02:55 @base.py:227][0m Creating the session ...
2018-11-14 00:02:55.852710: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-11-14 00:02:56.056995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2018-11-14 00:02:56.063291: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-14 00:02:56.971796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-14 00:02:56.982418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-11-14 00:02:56.985385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-11-14 00:02:56.988684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8788 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
[32m[1114 00:02:59 @base.py:233][0m Initializing the session ...
[32m[1114 00:02:59 @sessinit.py:117][0m Restoring checkpoint from cases/None/train1\model-100 ...
[32m[1114 00:02:59 @base.py:240][0m Graph Finalized.
[32m[1114 00:02:59 @concurrency.py:37][0m Starting EnqueueThread QueueInput/input_queue ...
[32m[1114 00:02:59 @graph.py:73][0m Running Op sync_variables/sync_variables_from_main_tower ...
[32m[1114 00:03:01 @base.py:272][0m Start Epoch 1 ...
0%| |0/100[00:00<?,?it/s]2018-11-14 00:03:10.004190: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 2.00G (2147483648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
[32m[1114 00:03:10 @input_source.py:168][0m [4m[5m[31mERR[0m Exception in EnqueueThread QueueInput/input_queue:
Traceback (most recent call last):
File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\input_source\input_source.py", line 158, in run
dp = next(self._itr)
File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\dataflow\common.py", line 355, in __iter__
for dp in self.ds:
File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\dataflow\parallel.py", line 199, in __iter__
dp = self.queue.get()
File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\queues.py", line 94, in get
res = self._recv_bytes()
File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\connection.py", line 318, in _recv_bytes
return self._get_more_data(ov, maxsize)
File "G:\Anaconda\envs\tensorflow-gpu\lib\multiprocessing\connection.py", line 340, in _get_more_data
ov, err = _winapi.ReadFile(self._handle, left, overlapped=True)
MemoryError
2018-11-14 00:03:10.166847: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.80G (1932735232 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2018-11-14 00:03:10.296870: E tensorflow/stream_executor/cuda/cuda_driver.cc:806] failed to allocate 1.62G (1739461632 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
[32m[1114 00:03:10 @common.py:147][0m [4m[5m[31mERR[0m Cannot batch data. Perhaps they are of inconsistent shape?
Traceback (most recent call last):
File "G:\Anaconda\envs\tensorflow-gpu\lib\site-packages\tensorpack\dataflow\common.py", line 145, in _aggregate_batch
np.asarray([x[k] for x in data_holder], dtype=tp))
File "C:\Users\Mloong\AppData\Roaming\Python\Python36\site-packages\numpy\core\numeric.py", line 501, in asarray
return array(a, dtype, copy=False, order=order)
MemoryError
[32m[1114 00:03:10 @common.py:150][0m [4m[5m[31mERR[0m Shape of all arrays to be batched: [(334, 90),
(334, 90),
(334, 90),
Hi @bhui looks like the error is related to incorrect format or corrupted files, make sure your training data for the second network is at 16000 sampling rate, mono and wave format. Also try with different versions of numpy, I recommend you to use conda enviroments :).
@carlfm01 Thank u!
It's finnally solved by add memory from 16GB to 32 GB.
But I have a new question, I run cmd python convert.py, and no wav file is generate.
How can I view the result of convert.py?
I has checked the file convert.py, the file should be gen into cases/None/train2.
But I cannot found outfiles.
To see the result you have to use tesorboard go to "cases/None" directory and in the command line type tensorboard --logdir=train2 Then you have to open the url that the console is printing, when you open the url there's a tab for audio, now you should see the generated audios
@carlfm01 Thanks! You helped me a lot. I am a beginner, just getting into tensorflow.
@carlfm01 I got the problem and still have no idea,is there somthing wrong with the data_load.py?
It's like there is something wrong with line 33 in data_load.py
@wuzhiyu666 Maybe incorrect path, share an example to one of your train1 wav files, and also the path that you are using in the code.
@carlfm01 the wav files is in C:\Users\SANDSTORM\Desktop\deep-voice-conversion-master\deep-voice-conversion-master\datasets\arctic\bdl
and the code C:\Users\SANDSTORM\Desktop\deep-voice-conversion-master\deep-voice-conversion-master
is that what you mean? I am a fresher
@wuzhiyu666 sorry, I meant for the timit data, that that you commented is for the second net.
sorry ,I have not downloaded the timit data yet,I will download it then tell you,Thank you very much!
@carlfm01 i have donwloaded the timit data but don't kown how to connect the data with the code. should I add path in params.py? I am in Win10 python3.6.6 and used the now code https://github.com/carlfm01/deep-voice-conversion
@wuzhiyu666 You should put all your timit data in see Or change that path to point your timit data.
@carlfm01 in this line, right? i will try it
@carlfm01 i have solved the previous problem by adding the path ,but.....train1 iterated to epoch 29 and got an error as the second picture why is there something wrong with the checkpoint file?
@carlfm01 Hi , I am a fresher ,too.
I use the code that you put on the website(https://github.com/carlfm01/deep-voice-conversion).
And I run the program with ㄏcommand train1.py case.
I have downloaded the TIMIT data and changed the path ,too.
It show the error message is
"usage: train1.py [-h] [-case CASE] [-ckpt CKPT] [-gpu GPU]
train1.py: error: unrecognized arguments: case"
Can you tell me how to slove it? Thank you so much.
@juihsuanlee Hi, try for example train1.py -case timit -gpu 0
I think I run into errors, so I decided to store and load unter the case name, cant remember at the momento, try the command and let me know.
@carlfm01 Hi, thanks for your reply. I have tried the command and it show the another error message. it show below. error message is "IndexError: Cannot choose from an empty sequence" I have met the same problem when I use the code from https://github.com/andabi/deep-voice-conversion, and I have no idea how to slove it.
This is an issue related to the path, can you check it again? For example in Windows I have to add ./ to the path, I don't know for Linux, or just maybe you are misspelling a word in the path
@carlfm01 Hi,thinks for your apply again. In my opinion, if I only run the train1.py ,and I should use the "data_path" in the Train1 this class. And the "data_path" means the TIMIT's path of .wav. I had downloaded and unzipped it. So the path I set data_path = './datasets/data/lisa/data/timit/raw/TIMIT/TRAIN/ / / *.wav' I am confused whether I am right or not.
In the yml file from upstream he used the path without the dot see https://github.com/andabi/deep-voice-conversion/blob/c922900bfc4d7e16b659d5b23837c1b724f78945/hparams/default.yaml#L32
I also noticed tha there's spaces in your path at the end
It should be '/datasets/data/lisa/data/timit/raw/TIMIT/TRAIN///*.wav' (Never mind, it's the formatting of github)
@carlfm01 Hi After I change the path with 'datasets/data/lisa/data/timit/raw/TIMIT/TRAIN///*.WAV' then it works and show another error message. raise ValueError("The while_v2 module is not set. Did you forget to " ValueError: The while_v2 module is not set. Did you forget to import tensorflow.python.ops.while_v2?
I continue searching the internet Orz
Did you installed tensorflow-gpu?
yes, I had used anaconda and installed the tensorflow_gpu version=1.10.0 picture below
@carlfm01 Hi, I wonder know if I use anaconda as the virtualenvironment tools will conflict with your code? Or I should use docker to run the program. thank you so much
Hi, I didn't use anaconda, I used python 3.6 without any enviroment activated on a vm
@carlfm01 I have trained NET1,NET2 successfully,but I got an error in convert.py : here is the .wav file that should be converted,and the path in the params.py code(line 105)
does this caused the error? I am so grateful to your for helping me!
Hi, looks like no ffmpeg installed, download ffmpeg and add it to your enviroment variables
@carlfm01 Sorry,I have done what you said , but it still doesn't work,I don't know whether I have done it correctly
do you mean the warning RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning ? I also tried this in audio.py but it's of no use... Thank you very much!
Can you verify that the audio input is using the same sample rate used for the training, by default 16000 Hz, mono, 16 bit
@carlfm01 Thank you ! the sample is from here in bdl, is it right?
My env is win10 + anaconda2 + python3.5. It's my first time to use tensorflow. The log below looks like something went wrong when parse hparams/default.yaml. I even have tried changed default.yaml the CF to window's CRLF. Cound someone help me ?