Closed serrunlee closed 3 years ago
Help me too. I found the problem when using this command.
set CUDA_VISIBLE_DEVICES=0 & python examples/tacotron2/train_tacotron2.py --train-dir ./dump_ljspeech/train/ --dev-dir ./dump_ljspeech/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ --config ./examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume ""
I use
Windows 10 Python 3.8 Cuda 10.1 CuDNN 7.6.5
The result is
2021-05-04 22:58:09.219113: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2021-05-04 22:58:18.244786: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll 2021-05-04 22:58:18.721343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 950M computeCapability: 5.0 coreClock: 0.928GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 74.65GiB/s 2021-05-04 22:58:18.721454: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2021-05-04 22:58:18.762007: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2021-05-04 22:58:18.796979: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2021-05-04 22:58:18.805429: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2021-05-04 22:58:18.853740: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2021-05-04 22:58:18.870273: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2021-05-04 22:58:18.942517: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2021-05-04 22:58:18.942926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2021-05-04 22:58:29.851597: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-05-04 22:58:29.865307: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1df08a6a1e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2021-05-04 22:58:29.865819: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2021-05-04 22:58:30.185937: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce GTX 950M computeCapability: 5.0 coreClock: 0.928GHz coreCount: 5 deviceMemorySize: 4.00GiB deviceMemoryBandwidth: 74.65GiB/s 2021-05-04 22:58:30.186472: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll 2021-05-04 22:58:30.193665: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll 2021-05-04 22:58:30.194283: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll 2021-05-04 22:58:30.195426: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll 2021-05-04 22:58:30.196202: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll 2021-05-04 22:58:30.198237: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll 2021-05-04 22:58:30.199124: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll 2021-05-04 22:58:30.201657: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2021-05-04 22:58:30.338847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-05-04 22:58:30.338976: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2021-05-04 22:58:30.341995: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2021-05-04 22:58:30.345054: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3120 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 950M, pci bus id: 0000:01:00.0, compute capability: 5.0) 2021-05-04 22:58:30.351056: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1df07c772f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2021-05-04 22:58:30.351207: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce GTX 950M, Compute Capability 5.0 2021-05-04 22:58:30,614 (train_tacotron2:421) INFO: hop_size = 256 2021-05-04 22:58:30,614 (train_tacotron2:421) INFO: format = npy 2021-05-04 22:58:30,617 (train_tacotron2:421) INFO: model_type = tacotron2 2021-05-04 22:58:30,619 (train_tacotron2:421) INFO: tacotron2_params = {'dataset': 'ljspeech', 'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'} 2021-05-04 22:58:30,620 (train_tacotron2:421) INFO: batch_size = 19 2021-05-04 22:58:30,622 (train_tacotron2:421) INFO: remove_short_samples = True 2021-05-04 22:58:30,622 (train_tacotron2:421) INFO: allow_cache = True 2021-05-04 22:58:30,623 (train_tacotron2:421) INFO: mel_length_threshold = 32 2021-05-04 22:58:30,624 (train_tacotron2:421) INFO: is_shuffle = True 2021-05-04 22:58:30,625 (train_tacotron2:421) INFO: use_fixed_shapes = True 2021-05-04 22:58:30,625 (train_tacotron2:421) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001} 2021-05-04 22:58:30,626 (train_tacotron2:421) INFO: gradient_accumulation_steps = 1 2021-05-04 22:58:30,627 (train_tacotron2:421) INFO: var_train_expr = None 2021-05-04 22:58:30,628 (train_tacotron2:421) INFO: train_max_steps = 1 2021-05-04 22:58:30,629 (train_tacotron2:421) INFO: save_interval_steps = 2000 2021-05-04 22:58:30,630 (train_tacotron2:421) INFO: eval_interval_steps = 500 2021-05-04 22:58:30,631 (train_tacotron2:421) INFO: log_interval_steps = 200 2021-05-04 22:58:30,631 (train_tacotron2:421) INFO: start_schedule_teacher_forcing = 200001 2021-05-04 22:58:30,632 (train_tacotron2:421) INFO: start_ratio_value = 0.5 2021-05-04 22:58:30,632 (train_tacotron2:421) INFO: schedule_decay_steps = 50000 2021-05-04 22:58:30,633 (train_tacotron2:421) INFO: end_ratio_value = 0.0 2021-05-04 22:58:30,633 (train_tacotron2:421) INFO: num_save_intermediate_results = 1 2021-05-04 22:58:30,634 (train_tacotron2:421) INFO: train_dir = ./dump_ljspeech/train/ 2021-05-04 22:58:30,635 (train_tacotron2:421) INFO: dev_dir = ./dump_ljspeech/valid/ 2021-05-04 22:58:30,641 (train_tacotron2:421) INFO: use_norm = True 2021-05-04 22:58:30,642 (train_tacotron2:421) INFO: outdir = ./examples/tacotron2/exp/train.tacotron2.v1/ 2021-05-04 22:58:30,642 (train_tacotron2:421) INFO: config = ./examples/tacotron2/conf/tacotron2.v1.yaml 2021-05-04 22:58:30,643 (train_tacotron2:421) INFO: resume = 2021-05-04 22:58:30,644 (train_tacotron2:421) INFO: verbose = 1 2021-05-04 22:58:30,644 (train_tacotron2:421) INFO: mixed_precision = False 2021-05-04 22:58:30,645 (train_tacotron2:421) INFO: pretrained = 2021-05-04 22:58:30,645 (train_tacotron2:421) INFO: version = 0.0 2021-05-04 22:58:30,646 (train_tacotron2:421) INFO: max_mel_length = 857 2021-05-04 22:58:30,646 (train_tacotron2:421) INFO: max_char_length = 169 Traceback (most recent call last): File "examples/tacotron2/train_tacotron2.py", line 513, in <module> main() File "examples/tacotron2/train_tacotron2.py", line 448, in main trainer = Tacotron2Trainer( File "examples/tacotron2/train_tacotron2.py", line 72, in __init__ self.init_train_eval_metrics(self.list_metrics_name) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 714, in init_train_eval_metrics super().init_train_eval_metrics(list_metrics_name) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow_tts\trainers\base_trainer.py", line 49, in init_train_eval_metrics {name: tf.keras.metrics.Mean(name="train_" + name, dtype=tf.float32)} File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\keras\metrics.py", line 482, in __init__ super(Mean, self).__init__( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\keras\metrics.py", line 329, in __init__ self.total = self.add_weight( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\keras\metrics.py", line 300, in add_weight return super(Metric, self).add_weight( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\keras\engine\base_layer.py", line 597, in add_weight variable = self._add_variable_with_custom_getter( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\training\tracking\base.py", line 745, in _add_variable_with_custom_getter new_variable = getter( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\keras\engine\base_layer_utils.py", line 133, in make_variable return tf_variables.VariableV1( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\variables.py", line 260, in __call__ return cls._variable_v1_call(*args, **kwargs) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\variables.py", line 206, in _variable_v1_call return previous_getter( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\variables.py", line 67, in getter return captured_getter(captured_previous, **kwargs) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py", line 2024, in creator_with_resource_vars created = self._create_variable(next_creator, **kwargs) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\distribute\one_device_strategy.py", line 266, in _create_variable return next_creator(**kwargs) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\variables.py", line 199, in <lambda> previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2583, in default_variable_creator return resource_variable_ops.ResourceVariable( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\variables.py", line 264, in __call__ return super(VariableMetaclass, cls).__call__(*args, **kwargs) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1507, in __init__ self._init_from_args( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1661, in _init_from_args handle = eager_safe_variable_handle( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 242, in eager_safe_variable_handle return _variable_handle_from_shape_and_dtype( File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 174, in _variable_handle_from_shape_and_dtype gen_logging_ops._assert( # pylint: disable=protected-access File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 49, in _assert _ops.raise_from_not_ok_status(e, name) File "C:\ai_app\anaconda3\envs\tf2gpu38\lib\site-packages\tensorflow\python\framework\ops.py", line 6843, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Help me too. I found the problem when using this command.
set CUDA_VISIBLE_DEVICES=0 & python examples/tacotron2/train_tacotron2.py --train-dir ./dump_ljspeech/train/ --dev-dir ./dump_ljspeech/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.v1/ --config ./examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume ""
I use
Windows 10 Python 3.8 Cuda 10.1 CuDNN 7.6.5
The result is