PathmindAI / nativerl

Train reinforcement learning agents using AnyLogic or Python-based simulations
Apache License 2.0
19 stars 4 forks source link

Unsupported operand type error when training anylogic model #492

Closed slinlee closed 2 years ago

slinlee commented 2 years ago

On dev.devpathmind.com I get this error when trying to train the AGV anylogic model, when using reward terms.

  File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 648, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/app/conda/lib/python3.7/site-packages/ray/rllib/env/base_env.py", line 421, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/app/work/pathmind_training/environments.py", line 271, in step
    self.term_contributions_dict[str(i)] += reward_array
TypeError: unsupported operand type(s) for +=: 'nativerl.Array' and 'nativerl.Array'
slinlee commented 2 years ago

@maxpumperla @brettskymind got any ideas for this?

maxpumperla commented 2 years ago

@slinlee I think this error (and this one https://github.com/SkymindIO/nativerl/issues/491) might be related to things we changed in the pynativerl interface recently. You'll notice that nativerl.Array is really just a numpy array under the hood, which does support the += operator, but it seems to get confused here. Note how this is similar to the list error we had yesterday. And issue https://github.com/SkymindIO/nativerl/issues/491 is also related to creating arrays "too early".

I think my push to make the interface more consistent has not been tested enough: https://github.com/SkymindIO/nativerl/commit/6b7ca8936c4f0ea2865d5149bd54da233faa5815#diff-4fdbc4b4d5f81fb90b1db025a9aa122aa7cfd1efc79e118f779996893cb8d435

I'd rather keep these changes and rather override the += operator etc., but we could also rollback the above changes (some things will have to change in environments.py again, too. Should be easy enough, but let me know if you need help of any kind.

maxpumperla commented 2 years ago

can you reproduce this locally? @slinlee

slinlee commented 2 years ago

@maxpumperla amazingly, I think #491 is a separate issue. Yesterday I made an update to the conda environment that is set up for every training. that broke the training for mouse and cheese. When I reverted the conda env change, those train again.

Even with the revert, this bug still exists but seems limited to when reward terms are in use. Training agv with a reward function works well.

slinlee commented 2 years ago

@maxpumperla is overloading the += still the way you want to go? I'm open to that.

maxpumperla commented 2 years ago

@slinlee ah, so this is a problem on an AL model then? I was confused as to why Python wouldn't be able to do that. I checked and this type of op does work as expected in pynativerl:

from pathmind_training.pynativerl import *
import numpy as np
x = Array(np.asarray([1,2,3])
x += x

which gives me array([2, 4, 6]). I have no clue how pybind deals with operator overloading, but maybe we can try a simple c = a + b instead first. If that fails, I assume we'll simply have to add an add method to the original C++ interface and implement it correctly. A simple workaround for this is to simply get the values like this:

https://github.com/SkymindIO/nativerl/blob/dev/nativerl/src/main/resources/ai/skymind/nativerl/nativerl.h#L48

then add the other data in question and create an Array again. That's likely very wasteful, so let's aim to extend the Array interface instead.

maxpumperla commented 2 years ago

In any case, I'm kind of glad that this wasn't really an issue on the Python side after all...

slinlee commented 2 years ago

@saudet - Do you have any insights on this bug?

slinlee commented 2 years ago

I just ran a quick test with converting a += b to a = a + b and it fails the same way:

  File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 96, in next
    batches = [self.get_data()]
  File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 223, in get_data
    item = next(self.rollout_provider)
  File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 648, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/app/conda/lib/python3.7/site-packages/ray/rllib/env/base_env.py", line 421, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/app/work/pathmind_training/environments.py", line 272, in step
    reward_array + self.term_contributions_dict[str(i)]
TypeError: unsupported operand type(s) for +: 'nativerl.Array' and 'nativerl.Array'
saudet commented 2 years ago

This is all stuff that was added in the Python module, so yes if we want to start doing complex operations on these types, we'll need to start adding more features there, possibly like this: https://pybind11.readthedocs.io/en/stable/advanced/classes.html#operator-overloading