google-deepmind / envlogger

A tool for recording RL trajectories.
Apache License 2.0
93 stars 13 forks source link

tfds backend fails when env is reset with no steps #9

Closed buckleytoby closed 2 weeks ago

buckleytoby commented 8 months ago

If you take the examples/tfds_random_agent_catch.py sample and call break right the env.reset() call, before any actions are taken, then the tfds backend will fail with error message:

Exception has occurred: ValueError
Failed to encode example:
In <Dataset> with name "steps":
In <Tensor> with name "reward":
Dtype object do not match float64
{'steps': [{'action': array(0, dtype=object), 'discount': array(0, dtype=object), 'is_first': True, 'is_last': False, 'is_terminal': False, 'observation': array([[0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0.]], dtype=float32), 'reward': array(0, dtype=object), 'timestamp': 1708228227.9312475}]}
kenjitoyama commented 3 months ago

Hi @buckleytoby , sorry for the massive delay, I somehow missed this bug.

@sabelaraga , do you know what's wrong here? Is this expected?

Daniel

sabelaraga commented 3 weeks ago

From the error, I would say that the reward is initialized wrongly with an int instead of a float. Are you still experiencing the error? Thanks!

kenjitoyama commented 2 weeks ago

Hi! Since there was no response from OP, I'll close this issue for now.