cmusphinx / g2p-seq2seq

G2P with Tensorflow
Other
667 stars 196 forks source link

Bus error on Raspberry PI #163

Closed embie27 closed 5 years ago

embie27 commented 5 years ago

I'm running g2p-seq2seq on my RPi3. Everything worked well until yesterday when a Bus error occurred. I've got the newest version of g2p-seq2seq, tensorflow v1.9 (also tried v1.10 and v1.11) and tensor2tensor v1.6.6. When running I get the following output:

/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.4 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
  return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: builtins.type size changed, may indicate binary incompatibility. Expected 432, got 412
  return f(*args, **kwds)
INFO:tensorflow:Importing user module g2p_seq2seq from path /usr/local/lib/python3.5/dist-packages/g2p_seq2seq-6.2.2a0-py3.5.egg
INFO:tensorflow:Overriding hparams in transformer_base with num_heads=4,batch_size=4096,num_hidden_layers=3,max_length=30,length_bucket_step=1.5,hidden_size=256,eval_drop_long_sequences=1,filter_size=512,min_length_bucket=6
WARNING:tensorflow:From /usr/local/lib/python3.5/dist-packages/tensor2tensor/utils/trainer_lib.py:165: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
INFO:tensorflow:schedule=continuous_train_and_eval
INFO:tensorflow:worker_gpu=1
INFO:tensorflow:sync=False
WARNING:tensorflow:Schedule=continuous_train_and_eval. Assuming that training is running on a single machine.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:ps_devices: ['gpu:0']
INFO:tensorflow:Using config: {'_num_ps_replicas': 0, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1
}
, '_master': '', 'data_parallelism': <tensor2tensor.utils.expert_utils.Parallelism object at 0x65931c10>, '_save_summary_steps': 100, '_train_distribute': None, 'use_tpu': False, '_device_fn': None, '_num_worker_replicas': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x70a64ff0>, '_task_type': None, '_model_dir': '/home/pi/Francis_installed/assets/model-de', '_task_id': 0, '_is_chief': True, '_keep_checkpoint_max': 1, '_log_step_count_steps': 100, '_environment': 'local', '_evaluation_master': '', '_save_checkpoints_steps': 5, '_keep_checkpoint_every_n_hours': 1, 't2t_device_info': {'num_async_replicas': 1}, '_session_config': gpu_options {
  per_process_gpu_memory_fraction: 0.95
}
allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
, '_tf_random_seed': None, '_save_checkpoints_secs': None}
WARNING:tensorflow:Estimator's model_fn (<function T2TModel.make_estimator_model_fn.<locals>.wrapping_model_fn at 0x658e9588>) includes params argument, but params are not passed to Estimator.
WARNING:tensorflow:Input graph does not use tf.data.Dataset or contain a QueueRunner. That means predict yields forever. This is probably a mistake.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'infer'
INFO:tensorflow:Setting hparams.symbol_dropout to 0.0
INFO:tensorflow:Setting hparams.dropout to 0.0
INFO:tensorflow:Setting hparams.layer_prepostprocess_dropout to 0.0
INFO:tensorflow:Setting hparams.attention_dropout to 0.0
INFO:tensorflow:Setting hparams.relu_dropout to 0.0
INFO:tensorflow:Greedy Decoding
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /home/pi/Francis_installed/assets/model-de/model.ckpt-9658
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
WARNING:tensorflow:Input graph does not use tf.data.Dataset or contain a QueueRunner. That means predict yields forever. This is probably a mistake.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Setting T2TModel mode to 'infer'
INFO:tensorflow:Setting hparams.symbol_dropout to 0.0
INFO:tensorflow:Setting hparams.dropout to 0.0
INFO:tensorflow:Setting hparams.layer_prepostprocess_dropout to 0.0
INFO:tensorflow:Setting hparams.attention_dropout to 0.0
INFO:tensorflow:Setting hparams.relu_dropout to 0.0
INFO:tensorflow:Greedy Decoding
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /home/pi/Francis_installed/assets/model-de/model.ckpt-9658
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
Bus error

My dmsg puts this output:

[  840.879414] Unhandled fault: alignment exception (0x001) at 0x606fe53c
[  840.889388] pgd = b9650000
[  840.895274] [606fe53c] *pgd=33c8b835, *pte=12bd475f, *ppte=12bd4c7f

Also tried to reinstall Raspbian but it didn't help.

nshmyrev commented 5 years ago

It goes out of memory, you do not have enough memory to run this.

See also https://github.com/tensorflow/tensorflow/issues/21926

embie27 commented 5 years ago

I've got plenty of memory left when the error occurs. Also I don not get warnings that my memory is running full.

woreom commented 4 years ago

@nshmyrev I'm having a similar problem, did you manage to fix this?

nshmyrev commented 4 years ago

@woreom it is an issue with tensorflow https://github.com/tensorflow/tensorflow/issues/21926,