Closed slinlee closed 2 years ago
@maxpumperla @brettskymind got any ideas for this?
@slinlee I think this error (and this one https://github.com/SkymindIO/nativerl/issues/491) might be related to things we changed in the pynativerl interface recently. You'll notice that nativerl.Array
is really just a numpy array under the hood, which does support the +=
operator, but it seems to get confused here. Note how this is similar to the list
error we had yesterday. And issue https://github.com/SkymindIO/nativerl/issues/491 is also related to creating arrays "too early".
I think my push to make the interface more consistent has not been tested enough: https://github.com/SkymindIO/nativerl/commit/6b7ca8936c4f0ea2865d5149bd54da233faa5815#diff-4fdbc4b4d5f81fb90b1db025a9aa122aa7cfd1efc79e118f779996893cb8d435
I'd rather keep these changes and rather override the +=
operator etc., but we could also rollback the above changes (some things will have to change in environments.py
again, too. Should be easy enough, but let me know if you need help of any kind.
can you reproduce this locally? @slinlee
@maxpumperla amazingly, I think #491 is a separate issue. Yesterday I made an update to the conda environment that is set up for every training. that broke the training for mouse and cheese. When I reverted the conda env change, those train again.
Even with the revert, this bug still exists but seems limited to when reward terms are in use. Training agv with a reward function works well.
@maxpumperla is overloading the +=
still the way you want to go? I'm open to that.
@slinlee ah, so this is a problem on an AL model then? I was confused as to why Python wouldn't be able to do that. I checked and this type of op does work as expected in pynativerl
:
from pathmind_training.pynativerl import *
import numpy as np
x = Array(np.asarray([1,2,3])
x += x
which gives me array([2, 4, 6])
. I have no clue how pybind
deals with operator overloading, but maybe we can try a simple c = a + b
instead first. If that fails, I assume we'll simply have to add an add
method to the original C++ interface and implement it correctly. A simple workaround for this is to simply get the values like this:
then add the other data in question and create an Array
again. That's likely very wasteful, so let's aim to extend the Array
interface instead.
In any case, I'm kind of glad that this wasn't really an issue on the Python side after all...
@saudet - Do you have any insights on this bug?
I just ran a quick test with converting a += b
to a = a + b
and it fails the same way:
File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 96, in next
batches = [self.get_data()]
File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 223, in get_data
item = next(self.rollout_provider)
File "/app/conda/lib/python3.7/site-packages/ray/rllib/evaluation/sampler.py", line 648, in _env_runner
base_env.send_actions(actions_to_send)
File "/app/conda/lib/python3.7/site-packages/ray/rllib/env/base_env.py", line 421, in send_actions
obs, rewards, dones, infos = env.step(agent_dict)
File "/app/work/pathmind_training/environments.py", line 272, in step
reward_array + self.term_contributions_dict[str(i)]
TypeError: unsupported operand type(s) for +: 'nativerl.Array' and 'nativerl.Array'
This is all stuff that was added in the Python module, so yes if we want to start doing complex operations on these types, we'll need to start adding more features there, possibly like this: https://pybind11.readthedocs.io/en/stable/advanced/classes.html#operator-overloading
On dev.devpathmind.com I get this error when trying to train the AGV anylogic model, when using reward terms.