Closed lalimili6 closed 3 years ago
@lalimili6 can you share ur command line ?
@dathudeptrai TensorFlow version = 2.3.1 OS: ubuntu 20.04.1 gpu:
Tue Nov 17 13:19:59 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 1030 Off | 00000000:01:00.0 Off | N/A |
| 29% 36C P8 N/A / 30W | 164MiB / 2001MiB | 25% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1192 G /usr/lib/xorg/Xorg 21MiB |
| 0 1890 G /usr/lib/xorg/Xorg 60MiB |
| 0 2105 G /usr/bin/gnome-shell 73MiB |
+-----------------------------------------------------------------------------+
my coomand is:
javad@javad-HP-ProDesk-600-G2-SFF:~/Documents/TensorFlowTTS$ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py --train-dir ./dump_ljspeech/train/ --dev-dir ./dump_ljspeech/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.v1 --config ./examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume ""
output:
javad@javad-HP-ProDesk-600-G2-SFF:~/Documents/TensorFlowTTS$ CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py --train-dir ./dump_ljspeech/train/ --dev-dir ./dump_ljspeech/valid/ --outdir ./examples/tacotron2/exp/train.tacotron2.v1 --config ./examples/tacotron2/conf/tacotron2.v1.yaml --use-norm 1 --mixed_precision 0 --resume ""
2020-11-17 16:07:46.666191: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:07:47.614335: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-11-17 16:07:47.634757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:47.635090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.468GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 44.76GiB/s
2020-11-17 16:07:47.635129: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:07:47.637108: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-17 16:07:47.638634: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-17 16:07:47.638882: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-17 16:07:47.640629: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-17 16:07:47.641587: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-17 16:07:47.645347: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-17 16:07:47.645474: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:47.645758: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:47.645956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-17 16:07:48.831453: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-11-17 16:07:49.304366: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3399905000 Hz
2020-11-17 16:07:49.304873: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6064220 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-17 16:07:49.304908: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-11-17 16:07:49.919417: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.919929: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5f53670 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-17 16:07:49.919963: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GT 1030, Compute Capability 6.1
2020-11-17 16:07:49.920207: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.920540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GT 1030 computeCapability: 6.1
coreClock: 1.468GHz coreCount: 3 deviceMemorySize: 1.95GiB deviceMemoryBandwidth: 44.76GiB/s
2020-11-17 16:07:49.920581: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:07:49.920615: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-11-17 16:07:49.920641: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-11-17 16:07:49.920666: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-11-17 16:07:49.920692: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-11-17 16:07:49.920728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-11-17 16:07:49.920754: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-17 16:07:49.920835: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.921189: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:07:49.921479: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-11-17 16:07:49.941003: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-11-17 16:08:08.227068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-11-17 16:08:08.227117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2020-11-17 16:08:08.227138: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2020-11-17 16:08:08.310725: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:08:08.311191: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 16:08:08.311546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1435 MB memory) -> physical GPU (device: 0, name: GeForce GT 1030, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-11-17 16:15:32,661 (train_tacotron2:400) INFO: hop_size = 256
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: format = npy
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: model_type = tacotron2
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: tacotron2_params = {'dataset': 'ljspeech', 'embedding_hidden_size': 512, 'initializer_range': 0.02, 'embedding_dropout_prob': 0.1, 'n_speakers': 1, 'n_conv_encoder': 5, 'encoder_conv_filters': 512, 'encoder_conv_kernel_sizes': 5, 'encoder_conv_activation': 'relu', 'encoder_conv_dropout_rate': 0.5, 'encoder_lstm_units': 256, 'n_prenet_layers': 2, 'prenet_units': 256, 'prenet_activation': 'relu', 'prenet_dropout_rate': 0.5, 'n_lstm_decoder': 1, 'reduction_factor': 1, 'decoder_lstm_units': 1024, 'attention_dim': 128, 'attention_filters': 32, 'attention_kernel': 31, 'n_mels': 80, 'n_conv_postnet': 5, 'postnet_conv_filters': 512, 'postnet_conv_kernel_sizes': 5, 'postnet_dropout_rate': 0.1, 'attention_type': 'lsa'}
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: batch_size = 2
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: remove_short_samples = True
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: allow_cache = True
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: mel_length_threshold = 32
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: is_shuffle = True
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: use_fixed_shapes = False
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: optimizer_params = {'initial_learning_rate': 0.001, 'end_learning_rate': 1e-05, 'decay_steps': 150000, 'warmup_proportion': 0.02, 'weight_decay': 0.001}
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: var_train_expr = None
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: train_max_steps = 200000
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: save_interval_steps = 1
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: eval_interval_steps = 600
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: log_interval_steps = 200
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: start_schedule_teacher_forcing = 200001
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: start_ratio_value = 0.5
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: schedule_decay_steps = 50000
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: end_ratio_value = 0.0
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: num_save_intermediate_results = 1
2020-11-17 16:15:32,662 (train_tacotron2:400) INFO: train_dir = ./dump_ljspeech/train/
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: dev_dir = ./dump_ljspeech/valid/
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: use_norm = True
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: outdir = ./examples/tacotron2/exp/train.tacotron2.v1
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: config = ./examples/tacotron2/conf/tacotron2.v1.yaml
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: resume =
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: verbose = 1
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: mixed_precision = False
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: pretrained =
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: version = 0.0
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: max_mel_length = 1289
2020-11-17 16:15:32,663 (train_tacotron2:400) INFO: max_char_length = 168
2020-11-17 16:16:05.624531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-11-17 16:16:29.137582: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
Model: "tacotron2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoder (TFTacotronEncoder) multiple 8167936
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple 18246402
_________________________________________________________________
post_net (TFTacotronPostnet) multiple 5460480
_________________________________________________________________
residual_projection (Dense) multiple 41040
=================================================================
Total params: 31,915,858
Trainable params: 31,905,618
Non-trainable params: 10,240
_________________________________________________________________
[train]: 0%| | 0/200000 [00:00<?, ?it/s]2020-11-17 16:16:43.873594: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: cond/branch_executed/_8
[train]: 0%| | 1/200000 [00:27<1531:15:58, 27.56s/it]Traceback (most recent call last):
File "examples/tacotron2/train_tacotron2.py", line 488, in <module>
main()
File "examples/tacotron2/train_tacotron2.py", line 476, in main
trainer.fit(
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 870, in fit
self.run()
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 101, in run
self._train_epoch()
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 128, in _train_epoch
self._check_save_interval()
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 169, in _check_save_interval
self.save_checkpoint()
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 825, in save_checkpoint
self._model.save_weights(self.saved_path + "model-{}.h5".format(self.steps))
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2085, in save_weights
hdf5_format.save_weights_to_hdf5_group(f, self.layers)
File "/home/javad/.local/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 640, in save_weights_to_hdf5_group
param_dset = g.create_dataset(name, val.shape, dtype=val.dtype)
File "/home/javad/.local/lib/python3.8/site-packages/h5py/_hl/group.py", line 143, in create_dataset
if '/' in name:
TypeError: a bytes-like object is required, not 'str'
[train]: 0%| | 1/200000 [00:28<1586:52:17, 28.56s/it]
I set save_interval_steps: 1 since saving model as soon as possible. best regards
here is the issue
https://github.com/h5py/h5py/issues/1732
I useh5py==2.10.0
and fix.
best regards
Hi Dear, I got this error, how can fix it? best regards