Closed ejunprung closed 2 years ago
hmm i think i saw something like this earlier that I fixed. let me give it a try
K. I debugged this. since we don't extract the observations, the policy server only has /predict_raw and we're hardcoding to '/predict'
For now I'll adjust it so that the user enters the whole url, including /predict.
Alex just added a pr to the webapp that will extract the observations though, so they'll be available for policy server.
nvm, I can't just use /predict_raw because the observations are currently passed as a json dict. Switching it to an array would be a counter-productive hack.
I'll test Alex's PR on dev first.
Testing the PR on dev you'll see that there are named observations now. Once I have a policy server live I'll test @ejunprung 's original issue
:
@slinlee @ejunprung we can also test this locally, at least to fix this bug. Note that, in any case, this can be "solved" by changing how the simulation in question reads out actions. But our goal should be to be consistent between Local
and Server
policies.
Btw, it'd be super helpful to have a test policy server for mouse and cheese up and running at all times, so that we can integrate it into unit tests, and more importantly to demo the API. That'd be quite cool for the README as well. It'd be essentially those 3 lines plus imports to showcase how running simulations works!
@ejunprung I've tested the same thing like so:
In [8]: action = policy.get_actions(simulation)
In [9]: action[0]
Out[9]: array(None, dtype=object)
which means that we get a None
value response from the Server. That's indeed not a valid action.
The good news is that when I run a Local
policy against the same simulation
, I get the same response type:
In [13]: local_policy = Local(model_file="./examples/mouse_model")
In [14]: local_policy.get_actions(simulation)
Out[14]: {0: array([1])}
So, ultimately this is a non-issue. We "just" need to fix the corresponding policy server.
ok, I think I found it:
Out[38]: b'{"detail":[{"loc":["body","mouse_row_distance"],"msg":"field required","type":"value_error.missing"},{"loc":["body","mouse_col_distance"],"msg":"field required","type":"value_error.missing"}]}'
we have a discrepancy between features described in the schema.yaml
of the policy server, which e.g. has mouse_row_distance
, but in the local simulation here it's called distance_to_cheese_row
.
Closing this issue now, as it's a fluke. :D But I've opened #11 to make it easier to spot this kind of issue... after all we have all this fancy validation in policy server, so we might as well use it.
Btw, it'd be super helpful to have a test policy server for mouse and cheese up and running at all times, so that we can integrate it into unit tests, and more importantly to demo the API. That'd be quite cool for the README as well. It'd be essentially those 3 lines plus imports to showcase how running simulations works!
@maxpumperla - https://github.com/SkymindIO/nativerl/blob/dev/nativerl/python/tests/test_policy_serving.py#L11-L18 This policy server was trained on the anylogic model which is why the observations are different in this mouse_env_pathmind:
I'll have a mouse_and_cheese policy server up running that is trained from a Pathmind model, but right now they fail in training because it's expecting get_metrics()
btw, nothing is stopping us from providing get_metrics
. Just bc we've removed it from the interface, doesn't mean your sim can't still have it.
It works now that we have (1) observations integrated in the webapp and (2) the updated environment.py in the webapp
+---------+------+----------------+-----------+-----------+--------+
| Episode | Step | observations_0 | actions_0 | rewards_0 | done_0 |
+---------+------+----------------+-----------+-----------+--------+
+---------+------+----------------+-----------+-----------+--------+
[0, 0, {'mouse_row': 0.0, 'mouse_col': 0.0, 'distance_to_cheese_row': 0.8, 'distance_to_cheese_col': 0.8, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([0]), {'found_cheese': 0}, False]
[0, 1, {'mouse_row': 0.2, 'mouse_col': 0.0, 'distance_to_cheese_row': 0.6, 'distance_to_cheese_col': 0.8, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([1]), {'found_cheese': 0}, False]
[0, 2, {'mouse_row': 0.2, 'mouse_col': 0.2, 'distance_to_cheese_row': 0.6, 'distance_to_cheese_col': 0.6, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([0]), {'found_cheese': 0}, False]
[0, 3, {'mouse_row': 0.4, 'mouse_col': 0.2, 'distance_to_cheese_row': 0.4, 'distance_to_cheese_col': 0.6, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([1]), {'found_cheese': 0}, False]
[0, 4, {'mouse_row': 0.4, 'mouse_col': 0.4, 'distance_to_cheese_row': 0.4, 'distance_to_cheese_col': 0.4, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([0]), {'found_cheese': 0}, False]
[0, 5, {'mouse_row': 0.6, 'mouse_col': 0.4, 'distance_to_cheese_row': 0.2, 'distance_to_cheese_col': 0.4, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([1]), {'found_cheese': 0}, False]
[0, 6, {'mouse_row': 0.6, 'mouse_col': 0.6, 'distance_to_cheese_row': 0.2, 'distance_to_cheese_col': 0.2, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([1]), {'found_cheese': 0}, False]
[0, 7, {'mouse_row': 0.6, 'mouse_col': 0.8, 'distance_to_cheese_row': 0.2, 'distance_to_cheese_col': 0.0, 'cheese_row': 0.8, 'cheese_col': 0.8}, array([0]), {'found_cheese': 1}, True]
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------------------+--------+
| Episode | Step | observations_0 | actions_0 | rewards_0 | done_0 |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------------------+--------+
| 0 | 0 | {'mouse_row': 0.0, 'mouse_col': 0.0, 'distance_to_cheese_row': 0.8, 'distance_to_cheese_col': 0.8, 'cheese_row': 0.8, 'cheese_col': 0.8} | [0] | {'found_cheese': 0} | False |
| 0 | 1 | {'mouse_row': 0.2, 'mouse_col': 0.0, 'distance_to_cheese_row': 0.6, 'distance_to_cheese_col': 0.8, 'cheese_row': 0.8, 'cheese_col': 0.8} | [1] | {'found_cheese': 0} | False |
| 0 | 2 | {'mouse_row': 0.2, 'mouse_col': 0.2, 'distance_to_cheese_row': 0.6, 'distance_to_cheese_col': 0.6, 'cheese_row': 0.8, 'cheese_col': 0.8} | [0] | {'found_cheese': 0} | False |
| 0 | 3 | {'mouse_row': 0.4, 'mouse_col': 0.2, 'distance_to_cheese_row': 0.4, 'distance_to_cheese_col': 0.6, 'cheese_row': 0.8, 'cheese_col': 0.8} | [1] | {'found_cheese': 0} | False |
| 0 | 4 | {'mouse_row': 0.4, 'mouse_col': 0.4, 'distance_to_cheese_row': 0.4, 'distance_to_cheese_col': 0.4, 'cheese_row': 0.8, 'cheese_col': 0.8} | [0] | {'found_cheese': 0} | False |
| 0 | 5 | {'mouse_row': 0.6, 'mouse_col': 0.4, 'distance_to_cheese_row': 0.2, 'distance_to_cheese_col': 0.4, 'cheese_row': 0.8, 'cheese_col': 0.8} | [1] | {'found_cheese': 0} | False |
| 0 | 6 | {'mouse_row': 0.6, 'mouse_col': 0.6, 'distance_to_cheese_row': 0.2, 'distance_to_cheese_col': 0.2, 'cheese_row': 0.8, 'cheese_col': 0.8} | [1] | {'found_cheese': 0} | False |
| 0 | 7 | {'mouse_row': 0.6, 'mouse_col': 0.8, 'distance_to_cheese_row': 0.2, 'distance_to_cheese_col': 0.0, 'cheese_row': 0.8, 'cheese_col': 0.8} | [0] | {'found_cheese': 1} | True |
+---------+------+------------------------------------------------------------------------------------------------------------------------------------------+-----------+---------------------+--------+
Process finished with exit code 0
I'm seeing the below when testing policy server. I'm guessing it's because policy server outputs an integer whereas
simulation.py
expects a 2D array or something like that.