georgesterpu / avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models
GNU General Public License v3.0
82 stars 28 forks source link

run_audiovisual.py #27

Open TanYuChen1 opened 1 year ago

TanYuChen1 commented 1 year ago

Hi, Thank you for your open-source codes. I used my own dataset on your model but encountered a problem. In fact, I ran extract_faces.py and write_records_tcd.py without any issues. The error message is as follows:

Traceback (most recent call last):
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[{{node Decoder/decoder/my_dense/bias_0-grad}}]]
  (1) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[{{node Decoder/decoder/my_dense/bias_0-grad}}]]
         [[gradients/Decoder/decoder/while/BasicDecoderStep/decoder/attention_wrapper/Select_grad/Select/StackPopV2/_198]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_audiovisual.py", line 64, in <module>
    main()
  File "run_audiovisual.py", line 59, in main
    logfile=logfile,
  File "/home/exp/test/avsr-tf1-yjq/avsr/experiment.py", line 111, in run_experiment
    try_restore_latest_checkpoint=True
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 274, in train
    ], **self.sess_opts)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[node Decoder/decoder/my_dense/bias_0-grad (defined at /home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Invalid argument: Nan in summary histogram for: Decoder/decoder/my_dense/bias_0-grad
         [[node Decoder/decoder/my_dense/bias_0-grad (defined at /home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
         [[gradients/Decoder/decoder/while/BasicDecoderStep/decoder/attention_wrapper/Select_grad/Select/StackPopV2/_198]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'Decoder/decoder/my_dense/bias_0-grad':
  File "run_audiovisual.py", line 64, in <module>
    main()
  File "run_audiovisual.py", line 59, in main
    logfile=logfile,
  File "/home/exp/test/avsr-tf1-yjq/avsr/experiment.py", line 106, in run_experiment
    **kwargs
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 216, in __init__
    self._create_models()
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 531, in _create_models
    batch_size=self._hparams.batch_size[0])
  File "/home/exp/test/avsr-tf1-yjq/avsr/avsr.py", line 574, in _make_model
    hparams=self._hparams
  File "/home/exp/test/avsr-tf1-yjq/avsr/seq2seq.py", line 26, in __init__
    self._init_optimiser()
  File "/home/exp/test/avsr-tf1-yjq/avsr/seq2seq.py", line 231, in _init_optimiser
    summary = tf.summary.histogram("%s-grad" % variable.name, value)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/summary/summary.py", line 179, in histogram
    tag=tag, values=values, name=scope)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 329, in histogram_summary
    "HistogramSummary", tag=tag, values=values, name=name)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/exp/anaconda3/envs/avsr-tf1/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

Is this possibly caused by different data dimensions? Thanks a lot.

georgesterpu commented 1 year ago

Hi @TanYuChen1 Thanks for creating this issue. Your error message appears to suggest that there were NaNs in one of the gradient histogram tensors.

Please note that I am no longer maintaining this repository. If I had the chance to work again in AVSR, I would probably start by porting everything here to Pytorch (+ e.g. Lightning) or Keras, in order to leverage the latest advancements in the space of ML frameworks.