TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.85k stars 815 forks source link

Cannot train MFA-aligned FastSpeech2 with gradient accumulator: ValueError: None values not supported. #389

Closed ZDisket closed 4 years ago

ZDisket commented 4 years ago

I tried training FastSpeech2 on LJSpeech resampled to 24KHz with gradient_accumulation_steps: 1 and batch size 128 with mixed precision on a Tesla T4 (14GB of VRAM) and got this:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Traceback (most recent call last):
  File "/content/TensorflowTTS/ttsexamples/fastspeech2/train_fastspeech2.py", line 436, in <module>
    main()
  File "/content/TensorflowTTS/ttsexamples/fastspeech2/train_fastspeech2.py", line 428, in main
    resume=args.resume,
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 1002, in fit
    self.run()
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 103, in run
    self._train_epoch()
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 125, in _train_epoch
    self._train_step(batch)
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 780, in _train_step
    self.one_step_forward(batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 823, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 697, in _initialize
    *args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 3075, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper
    raise e.ag_error_metadata.to_exception(e)
ValueError: in user code:

    /content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py:788 _one_step_forward  *
        per_replica_losses = self._strategy.run(
    /content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py:835 _one_step_forward_per_replica  *
        self._optimizer.apply_gradients(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:380 apply_gradients  **
        args=(grads_and_vars, name, experimental_aggregate_gradients))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2715 merge_call
        return self._merge_call(merge_fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2722 _merge_call
        return merge_fn(self._strategy, *args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:410 _apply_gradients_cross_replica  **
        do_not_apply_fn)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/smart_cond.py:59 smart_cond
        name=name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py:507 new_func
        return func(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/control_flow_ops.py:1180 cond
        return cond_v2.cond_v2(pred, true_fn, false_fn, name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/cond_v2.py:85 cond_v2
        op_return_value=pred)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py:986 func_graph_from_py_func
        func_outputs = python_func(*func_args, **func_kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:396 apply_fn
        args=(grads, wrapped_vars, name, experimental_aggregate_gradients))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/one_device_strategy.py:367 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:420 _apply_gradients
        experimental_aggregate_gradients=experimental_aggregate_gradients)
    /content/TensorflowTTS/tensorflow_tts/optimizers/adamweightdecay.py:124 apply_gradients
        (grads, _) = tf.clip_by_global_norm(grads, clip_norm=clip_norm)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/clip_ops.py:352 clip_by_global_norm
        constant_op.constant(1.0, dtype=use_norm.dtype) / clip_norm)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1124 binary_op_wrapper
        return func(x, y, name=name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1296 truediv
        return _truediv_python3(x, y, name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1222 _truediv_python3
        y = ops.convert_to_tensor(y, dtype_hint=x.dtype.base_dtype, name="y")
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:1499 convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:338 _constant_tensor_conversion_function
        return constant(v, dtype=dtype, name=name)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:264 constant
        allow_broadcast=True)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/constant_op.py:282 _constant_impl
        allow_broadcast=allow_broadcast))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/tensor_util.py:444 make_tensor_proto
        raise ValueError("None values not supported.")

    ValueError: None values not supported.

[train]:   0% 0/150000 [01:14<?, ?it/s]

Any ideas?

dathudeptrai commented 4 years ago

@ZDisket if you use gradient_accumulation_steps: 1 the training behavior is the same as the old version. SO the bug shouldn't cause by gradient accumulator.

dathudeptrai commented 4 years ago

here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L836) and here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L859), let try to replace it by:

zip(gradients, self._trainable_variables), 1.0
ZDisket commented 4 years ago

@dathudeptrai

if you use gradient_accumulation_steps: 1 the training behavior is the same as the old version. SO the bug shouldn't cause by gradient accumulator.

Then what's the correct value? Also, I'll try that solution.

ZDisket commented 4 years ago

@dathudeptrai

here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L836) and here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L859), let try to replace it by:

zip(gradients, self._trainable_variables), 1.0

Now I'm getting OOM, so I guess that works.

dathudeptrai commented 4 years ago

@dathudeptrai

here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L836) and here (https://github.com/TensorSpeech/TensorFlowTTS/blob/master/tensorflow_tts/trainers/base_trainer.py#L859), let try to replace it by: zip(gradients, self._trainable_variables), 1.0

Now I'm getting OOM, so I guess that works.

so let try to set batch_size 16 and gradient_accummulate: 8 :D then you can training with batch-size 128

ZDisket commented 4 years ago

@dathudeptrai Another error shortly in training

[train]:   0% 0/150000 [00:00<?, ?it/s]2020-11-25 03:57:00.978225: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
2020-11-25 03:57:00.983575: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do
2020-11-25 03:57:10.944365: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 2036 of 12445
2020-11-25 03:57:20.939860: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 4161 of 12445
2020-11-25 03:57:30.938864: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 6315 of 12445
2020-11-25 03:57:40.941579: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 8393 of 12445
2020-11-25 03:57:50.942838: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 10446 of 12445
2020-11-25 03:58:00.675132: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:221] Shuffle buffer filled.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/indexed_slices.py:433: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
2020-11-25 03:58:31.339181: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 2354/30897 nodes to float16 precision using 224 cast(s) to float16 (excluding Const and Variable casts)
2020-11-25 03:58:39.482413: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 0/23290 nodes to float16 precision using 0 cast(s) to float16 (excluding Const and Variable casts)
[train]:   0% 1/150000 [01:48<4517:25:13, 108.42s/it]2020-11-25 03:58:51.647473: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 1172/11808 nodes to float16 precision using 113 cast(s) to float16 (excluding Const and Variable casts)
2020-11-25 03:58:54.483251: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1924] Converted 0/9872 nodes to float16 precision using 0 cast(s) to float16 (excluding Const and Variable casts)
[train]:   0% 97/150000 [06:43<116:08:00,  2.79s/it]2020-11-25 04:03:45.665215: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at constant_op.cc:185 : Invalid argument: Dimension -2147483648 must be >= 0
Traceback (most recent call last):
  File "/content/TensorflowTTS/ttsexamples/fastspeech2/train_fastspeech2.py", line 436, in <module>
    main()
  File "/content/TensorflowTTS/ttsexamples/fastspeech2/train_fastspeech2.py", line 428, in main
    resume=args.resume,
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 1002, in fit
    self.run()
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 103, in run
    self._train_epoch()
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 125, in _train_epoch
    self._train_step(batch)
  File "/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py", line 780, in _train_step
    self.one_step_forward(batch)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py", line 807, in _call
    return self._stateless_fn(*args, **kwds)  # pylint: disable=not-callable
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument:  Dimension -2147483648 must be >= 0
     [[{{node while/body/_1/while/tf_fast_speech2_1/length_regulator/zeros_1}}]]
     [[Func/while/body/_1/output_control_node/_2498/_503]]
  (1) Invalid argument:  Dimension -2147483648 must be >= 0
     [[{{node while/body/_1/while/tf_fast_speech2_1/length_regulator/zeros_1}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference__one_step_forward_46981]

Function call stack:
_one_step_forward -> _one_step_forward

[train]:   0% 97/150000 [06:45<174:05:03,  4.18s/it]
dathudeptrai commented 4 years ago

@ZDisket did you pull the newest code in master, seems the bug come from dataloader.

Zegalryang commented 4 years ago

@dathudeptrai Hi, I have same problem.

what should i do?


[train]: 0%| | 0/200000 [00:00<?, ?it/s]2020-11-25 05:12:33.624339: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do 2020-11-25 05:12:33.636275: I tensorflow/core/grappler/optimizers/auto_mixed_precision.cc:1345] No whitelist ops found, nothing to do 2020-11-25 05:12:43.546755: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 796 of 12209 2020-11-25 05:12:53.534750: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 1609 of 12209 2020-11-25 05:13:03.578700: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 2442 of 12209 2020-11-25 05:13:13.575282: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 3256 of 12209 2020-11-25 05:13:23.532050: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 4079 of 12209 2020-11-25 05:13:33.536464: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 4887 of 12209 2020-11-25 05:13:43.601250: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 5668 of 12209 2020-11-25 05:13:53.540969: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 6495 of 12209 2020-11-25 05:14:03.559303: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 7341 of 12209 2020-11-25 05:14:13.577042: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 8170 of 12209 2020-11-25 05:14:23.530453: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 9012 of 12209 2020-11-25 05:14:33.614216: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 9779 of 12209 2020-11-25 05:14:43.539679: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 10597 of 12209 2020-11-25 05:14:53.548933: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:172] Filling up shuffle buffer (this may take a while): 11419 of 12209 2020-11-25 05:15:03.032730: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:221] Shuffle buffer filled. /root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/indexed_slices.py:432: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. warnings.warn( Traceback (most recent call last): File "examples/fastspeech2/train_fastspeech2.py", line 417, in main() File "examples/fastspeech2/train_fastspeech2.py", line 405, in main trainer.fit( File "/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 1002, in fit self.run() File "/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 103, in run self._train_epoch() File "/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 125, in _train_epoch self._train_step(batch) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py", line 780, in _train_step self.one_step_forward(batch) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 780, in call result = self._call(*args, kwds) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 823, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 696, in _initialize self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2855, in _get_concrete_function_internal_garbage_collected graphfunction, , _ = self._maybe_define_function(args, kwargs) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3213, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3065, in _create_graph_function func_graph_module.func_graph_from_py_func( File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 986, in func_graph_from_py_func func_outputs = python_func(*func_args, *func_kwargs) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 600, in wrapped_fn return weak_wrapped_fn().wrapped(args, kwds) File "/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 973, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code:

/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py:788 _one_step_forward  *
    per_replica_losses = self._strategy.run(
/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/trainers/base_trainer.py:835 _one_step_forward_per_replica  *
    self._optimizer.apply_gradients(
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:378 apply_gradients  **
    return distribution_strategy_context.get_replica_context().merge_call(
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2715 merge_call
    return self._merge_call(merge_fn, args, kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2722 _merge_call
    return merge_fn(self._strategy, *args, **kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:408 _apply_gradients_cross_replica  **
    maybe_apply_op = smart_cond.smart_cond(should_apply_grads,
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/smart_cond.py:58 smart_cond
    return control_flow_ops.cond(pred, true_fn=true_fn, false_fn=false_fn,
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
    return target(*args, **kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py:507 new_func
    return func(*args, **kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py:1180 cond
    return cond_v2.cond_v2(pred, true_fn, false_fn, name)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/cond_v2.py:79 cond_v2
    true_graph = func_graph_module.func_graph_from_py_func(
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py:986 func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:394 apply_fn
    return distribution.extended.call_for_each_replica(
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/distribute/one_device_strategy.py:367 _call_for_each_replica
    return fn(*args, **kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/keras/mixed_precision/experimental/loss_scale_optimizer.py:418 _apply_gradients
    return self._optimizer.apply_gradients(
/root/anaconda3/lib/python3.8/site-packages/tensorflow_tts/optimizers/adamweightdecay.py:124 apply_gradients
    (grads, _) = tf.clip_by_global_norm(grads, clip_norm=clip_norm)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
    return target(*args, **kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/clip_ops.py:352 clip_by_global_norm
    constant_op.constant(1.0, dtype=use_norm.dtype) / clip_norm)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py:1124 binary_op_wrapper
    return func(x, y, name=name)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
    return target(*args, **kwargs)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py:1296 truediv
    return _truediv_python3(x, y, name)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/ops/math_ops.py:1222 _truediv_python3
    y = ops.convert_to_tensor(y, dtype_hint=x.dtype.base_dtype, name="y")
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:1499 convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:338 _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:263 constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py:280 _constant_impl
    tensor_util.make_tensor_proto(
/root/anaconda3/lib/python3.8/site-packages/tensorflow/python/framework/tensor_util.py:444 make_tensor_proto
    raise ValueError("None values not supported.")

ValueError: None values not supported.

[train]: 0%| | 0/200000 [02:40<?, ?it/s]

dathudeptrai commented 4 years ago

@Zegalryang pull newest code :)).

Zegalryang commented 4 years ago

@dathudeptrai I pulled already newest code commit : ea72bab6cef40dff68b0b619ecf0d7c9cce3e3f0

dathudeptrai commented 4 years ago

@dathudeptrai I pulled already newest code commit : ea72bab

pull and pip install -e .

The newest code fixed ur problem.

ZDisket commented 4 years ago

@dathudeptrai

did you pull the newest code in master, seems the bug come from dataloader.

Now it works well.

Zegalryang commented 4 years ago

@dathudeptrai it works!! thanks!!

ZDisket commented 4 years ago

@dathudeptrai Training with gradient accumulator for effective batch_size 128 is slow, about 2.7s/it, on a GPU that would normally get 2.9it/s. Is this normal?

dathudeptrai commented 4 years ago

@dathudeptrai Training with gradient accumulator for effective batch_size 128 is slow, about 2.7s/it, on a GPU that would normally get 2.9it/s. Is this normal?

normally, we train with batch_size 16 so you can obtain 3it/s but now you are training with batch-size 128 so 2.7s/it is normal.