[BUG] Frequent AI error in the logs

umbynos commented 4 years ago

Describe the bug Frequently I see errors regarding ai training.

To Reproduce Steps to reproduce the behavior: Turn on the pwnagotchi and wait some time.

Expected behavior Not to see errors

Logs

[2020-07-05 17:29:36,113] [WARNING] 11 epochs with no activity -> bored
[2020-07-05 17:30:18,520] [INFO] [ai] setting new policy:
[2020-07-05 17:30:18,534] [INFO] [ai] ! min_rssi: -60 -> -50
[2020-07-05 17:30:18,539] [INFO] [ai] ! ap_ttl: 297 -> 513
[2020-07-05 17:30:18,547] [INFO] [ai] ! sta_ttl: 278 -> 131
[2020-07-05 17:30:18,550] [INFO] [ai] ! recon_time: 46 -> 32
[2020-07-05 17:30:18,553] [INFO] [ai] ! max_inactive_scale: 6 -> 7
[2020-07-05 17:30:18,563] [INFO] [ai] ! recon_inactive_multiplier: 1 -> 3
[2020-07-05 17:30:18,566] [INFO] [ai] ! hop_recon_time: 28 -> 34
[2020-07-05 17:30:18,569] [INFO] [ai] ! min_recon_time: 17 -> 28
[2020-07-05 17:30:18,574] [INFO] [ai] ! max_interactions: 9 -> 23
[2020-07-05 17:30:18,579] [INFO] [ai] ! max_misses_for_recon: 7 -> 6
[2020-07-05 17:30:18,589] [INFO] [ai] ! excited_num_epochs: 22 -> 24
[2020-07-05 17:30:18,592] [INFO] [ai] ! bored_num_epochs: 11 -> 24
[2020-07-05 17:30:18,595] [INFO] [ai] ! sad_num_epochs: 16 -> 15
[2020-07-05 17:30:18,603] [INFO] [ai] ! channels: [1, 4, 5, 6, 7, 8, 9, 11] -> [1, 10]
[2020-07-05 17:30:26,877] [INFO] [epoch 61] duration=00:00:50 slept_for=00:00:46 blind=0 sad=0 bored=0 inactive=12 active=0 peers=0 tot_bond=0.00 avg_bond=0.00 hops=1 missed=0 deauths=0 assocs=0 handshakes=0 cpu=100% mem=80% temperature=58C reward=-0.037995391705069124
[2020-07-05 17:30:26,891] [INFO] [ai] saving model to /root/brain.nn ...
[2020-07-05 17:30:28,575] [INFO] [ai] saving /root/brain.json
[2020-07-05 17:32:10,317] [INFO] [epoch 62] duration=00:01:43 slept_for=00:01:36 blind=0 sad=0 bored=0 inactive=13 active=0 peers=0 tot_bond=0.00 avg_bond=0.00 hops=0 missed=0 deauths=0 assocs=0 handshakes=0 cpu=100% mem=80% temperature=58C reward=-0.04126984126984127
[2020-07-05 17:33:54,381] [INFO] [epoch 63] duration=00:01:44 slept_for=00:01:36 blind=0 sad=0 bored=0 inactive=14 active=0 peers=0 tot_bond=0.00 avg_bond=0.00 hops=0 missed=0 deauths=0 assocs=0 handshakes=0 cpu=100% mem=80% temperature=58C reward=-0.043750000000000004
[2020-07-05 17:35:35,790] [INFO] [epoch 64] duration=00:01:41 slept_for=00:01:36 blind=0 sad=1 bored=0 inactive=15 active=0 peers=0 tot_bond=0.00 avg_bond=0.00 hops=0 missed=0 deauths=0 assocs=0 handshakes=0 cpu=100% mem=80% temperature=58C reward=-0.046153846153846156
[2020-07-05 17:35:35,822] [WARNING] 15 epochs with no activity -> sad
[2020-07-05 17:35:39,480] [INFO] [update] checking for updates ...
[2020-07-05 17:35:49,103] [INFO] [update] done
[2020-07-05 17:36:30,747] [ERROR] [ai] error while training (Cast string to float is not supported
     [[node RMSProp/update_model/vf/w/Cast (defined at /lib/python3/dist-packages/tensorflow_core/python/framework/ops.py:1692) ]]

Original stack trace for 'RMSProp/update_model/vf/w/Cast':
  File "/local/lib/python3.7/dist-packages/pwnagotchi/ai/train.py", line 162, in _ai_worker
    self._model = ai.load(self._config, self, self._epoch)
  File "/local/lib/python3.7/dist-packages/pwnagotchi/ai/__init__.py", line 42, in load
    a2c = A2C(MlpLstmPolicy, env, **config['params'])
  File "/local/lib/python3.7/dist-packages/stable_baselines/a2c/a2c.py", line 86, in __init__
    self.setup_model()
  File "/local/lib/python3.7/dist-packages/stable_baselines/a2c/a2c.py", line 161, in setup_model
    self.apply_backprop = trainer.apply_gradients(grads)
  File "/lib/python3/dist-packages/tensorflow_core/python/training/optimizer.py", line 614, in apply_gradients
    update_ops.append(processor.update_op(self, grad))
  File "/lib/python3/dist-packages/tensorflow_core/python/training/optimizer.py", line 119, in update_op
    update_op = optimizer._apply_dense(g, self._v)  # pylint: disable=protected-access
  File "/lib/python3/dist-packages/tensorflow_core/python/training/rmsprop.py", line 164, in _apply_dense
    math_ops.cast(self._epsilon_tensor, var.dtype.base_dtype),
  File "/lib/python3/dist-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/lib/python3/dist-packages/tensorflow_core/python/ops/math_ops.py", line 692, in cast
    x = gen_math_ops.cast(x, base_type, name=name)
  File "/lib/python3/dist-packages/tensorflow_core/python/ops/gen_math_ops.py", line 2191, in cast
    "Cast", x=x, DstT=DstT, Truncate=Truncate, name=name)
  File "/lib/python3/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 793, in _apply_op_helper
    op_def=op_def)
  File "/lib/python3/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/lib/python3/dist-packages/tensorflow_core/python/framework/ops.py", line 3299, in create_op
    op_def=op_def)
  File "/lib/python3/dist-packages/tensorflow_core/python/framework/ops.py", line 1692, in __init__
    self._traceback = tf_stack.extract_stack()
)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 1348, in _do_call
    return fn(*args)
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 1333, in _run_fn
    target_list, run_metadata)
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 1421, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnimplementedError: Cast string to float is not supported
     [[{{node RMSProp/update_model/vf/w/Cast}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/pwnagotchi/ai/train.py", line 177, in _ai_worker
    self._model.learn(total_timesteps=epochs_per_episode, callback=self.on_ai_training_step)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines/a2c/a2c.py", line 242, in learn
    self.num_timesteps // self.n_batch, writer)
  File "/usr/local/lib/python3.7/dist-packages/stable_baselines/a2c/a2c.py", line 210, in _train_step
    [self.summary, self.pg_loss, self.vf_loss, self.entropy, self.apply_backprop], td_map)
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 941, in run
    run_metadata_ptr)
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 1164, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 1342, in _do_run
    run_metadata)
  File "/usr/lib/python3/dist-packages/tensorflow_core/python/client/session.py", line 1362, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: Cast string to float is not supported
     [[node RMSProp/update_model/vf/w/Cast (defined at /lib/python3/dist-packages/tensorflow_core/python/framework/ops.py:1692) ]]

Original stack trace for 'RMSProp/update_model/vf/w/Cast':
  File "/local/lib/python3.7/dist-packages/pwnagotchi/ai/train.py", line 162, in _ai_worker
    self._model = ai.load(self._config, self, self._epoch)
  File "/local/lib/python3.7/dist-packages/pwnagotchi/ai/__init__.py", line 42, in load
    a2c = A2C(MlpLstmPolicy, env, **config['params'])
  File "/local/lib/python3.7/dist-packages/stable_baselines/a2c/a2c.py", line 86, in __init__
    self.setup_model()
  File "/local/lib/python3.7/dist-packages/stable_baselines/a2c/a2c.py", line 161, in setup_model
    self.apply_backprop = trainer.apply_gradients(grads)
  File "/lib/python3/dist-packages/tensorflow_core/python/training/optimizer.py", line 614, in apply_gradients
    update_ops.append(processor.update_op(self, grad))
  File "/lib/python3/dist-packages/tensorflow_core/python/training/optimizer.py", line 119, in update_op
    update_op = optimizer._apply_dense(g, self._v)  # pylint: disable=protected-access
  File "/lib/python3/dist-packages/tensorflow_core/python/training/rmsprop.py", line 164, in _apply_dense
    math_ops.cast(self._epsilon_tensor, var.dtype.base_dtype),
  File "/lib/python3/dist-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/lib/python3/dist-packages/tensorflow_core/python/ops/math_ops.py", line 692, in cast
    x = gen_math_ops.cast(x, base_type, name=name)
  File "/lib/python3/dist-packages/tensorflow_core/python/ops/gen_math_ops.py", line 2191, in cast
    "Cast", x=x, DstT=DstT, Truncate=Truncate, name=name)
  File "/lib/python3/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 793, in _apply_op_helper
    op_def=op_def)
  File "/lib/python3/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/lib/python3/dist-packages/tensorflow_core/python/framework/ops.py", line 3299, in create_op
    op_def=op_def)
  File "/lib/python3/dist-packages/tensorflow_core/python/framework/ops.py", line 1692, in __init__
    self._traceback = tf_stack.extract_stack()

[2020-07-05 17:36:30,949] [INFO] [ai] setting new policy:
[2020-07-05 17:36:30,973] [INFO] [ai] ! min_rssi: -50 -> -164
[2020-07-05 17:36:30,983] [INFO] [ai] ! ap_ttl: 513 -> 498
[2020-07-05 17:36:30,988] [INFO] [ai] ! sta_ttl: 131 -> 69
[2020-07-05 17:36:31,001] [INFO] [ai] ! recon_time: 32 -> 16
[2020-07-05 17:36:31,007] [INFO] [ai] ! max_inactive_scale: 7 -> 3
[2020-07-05 17:36:31,023] [INFO] [ai] ! recon_inactive_multiplier: 3 -> 2
[2020-07-05 17:36:31,029] [INFO] [ai] ! hop_recon_time: 34 -> 16
[2020-07-05 17:36:31,051] [INFO] [ai] ! min_recon_time: 28 -> 18
[2020-07-05 17:36:31,060] [INFO] [ai] ! max_interactions: 23 -> 11
[2020-07-05 17:36:31,063] [INFO] [ai] ! max_misses_for_recon: 6 -> 8
[2020-07-05 17:36:31,081] [INFO] [ai] ! excited_num_epochs: 24 -> 21
[2020-07-05 17:36:31,084] [INFO] [ai] ! bored_num_epochs: 24 -> 26
[2020-07-05 17:36:31,094] [INFO] [ai] ! sad_num_epochs: 15 -> 22
[2020-07-05 17:36:31,097] [INFO] [ai] ! channels: [1, 10] -> [1, 2, 4, 5, 6, 9, 10]

Environment (please complete the following information):

Pwnagotchi version: 1.5.3 (also 1.4.3 was giving me this problem)
OS version: raspbian lite (I downloaded and flashed the latest release from the repo)
Type of hardware: Raspberry pi 0w with recomended display

Additional context I've seen https://github.com/evilsocket/pwnagotchi/issues/837 issue and it seems my exact problem. My epochs_per_episode is set to 50 (default I think) When I did the update (from 1.4.3 to 1.5.3) I did a backup as explained in the docs. Maybe the brain got corrupt in some way?

BlackFrog1 commented 4 years ago

Looking over your log, the initial error is coming from tensor python library. Can you please post the result from the following command: pip list Thanks

umbynos commented 4 years ago

pip list shows -bash: pip: command not found While pip3 list:

Package              Version  
-------------------- ---------
absl-py              0.9.0    
ansible              2.7.7    
apache-libcloud      2.4.0    
asn1crypto           0.24.0   
astor                0.8.1    
bcrypt               3.1.6    
certifi              2018.8.24
chardet              3.0.4    
click                7.1.1    
cloudpickle          1.2.2    
cryptography         2.6.1    
cycler               0.10.0   
dbus-python          1.2.12   
entrypoints          0.3      
file-read-backwards  2.0.0    
Flask                1.0.2    
Flask-Cors           3.0.7    
Flask-WTF            0.14.3   
future               0.18.2   
gast                 0.2.2    
google-pasta         0.2.0    
grpcio               1.28.1   
gym                  0.14.0   
h5py                 2.10.0   
httplib2             0.11.3   
idna                 2.6      
inky                 0.0.5    
itsdangerous         1.1.0    
Jinja2               2.10     
jmespath             0.9.4    
joblib               0.14.1   
Keras-Applications   1.0.8    
Keras-Preprocessing  1.1.0    
keyring              17.1.1   
keyrings.alt         3.1.1    
kiwisolver           1.2.0    
lockfile             0.12.2   
Markdown             3.2.1    
MarkupSafe           1.1.0    
matplotlib           3.2.1    
mpi4py               2.0.0    
netaddr              0.7.19   
ntlm-auth            1.1.0    
numpy                1.17.2   
oauthlib             3.1.0    
olefile              0.46     
opencv-python        3.4.3.18 
pandas               1.0.3    
paramiko             2.4.2    
Pillow               5.4.1    
pip                  18.1     
protobuf             3.11.3   
pwnagotchi           1.5.3    
pyasn1               0.4.2    
pycrypto             2.6.1    
pycryptodome         3.9.4    
pyglet               1.3.2    
PyGObject            3.30.4   
pykerberos           1.1.14   
PyNaCl               1.3.0    
pyparsing            2.4.7    
PySocks              1.7.1    
python-apt           1.8.4.1  
python-dateutil      2.8.1    
pytz                 2019.3   
pywinrm              0.3.0    
pyxdg                0.25     
PyYAML               5.3.1    
requests             2.21.0   
requests-kerberos    0.11.0   
requests-ntlm        1.1.0    
requests-oauthlib    1.3.0    
RPi.GPIO             0.7.0    
scapy                2.4.3    
scipy                1.3.1    
SecretStorage        2.3.1    
setuptools           40.8.0   
simplejson           3.16.0   
six                  1.12.0   
smbus2               0.3.0    
spidev               3.4      
ssh-import-id        5.7      
stable-baselines     2.7.0    
tensorboard          1.13.1   
tensorflow           1.13.1   
tensorflow-estimator 1.14.0   
termcolor            1.1.0    
toml                 0.10.0   
tweepy               3.7.0    
urllib3              1.24.1   
websockets           8.1      
Werkzeug             1.0.1    
wheel                0.32.3   
wrapt                1.12.1   
WTForms              2.2.1    
xmltodict            0.11.0

evilsocket / pwnagotchi

[BUG] Frequent AI error in the logs #895