StanfordVL / OmniGibson

OmniGibson: a platform for accelerating Embodied AI research built upon NVIDIA's Omniverse engine. Join our Discord for support: https://discord.gg/bccR5vGFEx
https://behavior.stanford.edu/omnigibson/
MIT License
382 stars 42 forks source link

Exception when running learning.navigation_policy_demo, self._potential_fcn(env) returns None #715

Open mattmazzola opened 2 months ago

mattmazzola commented 2 months ago

When attempting to run the Reinforcement Learning demo described in the docs here: https://behavior.stanford.edu/omnigibson/getting_started/examples.html#reinforcement-learning-demo

python -m omnigibson.examples.learning.navigation_policy_demo

Exception

TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

image

Source

https://github.com/StanfordVL/OmniGibson/blob/cc0316a0574018a3cb2956fcbff3be75c07cdf0f/omnigibson/reward_functions/potential_reward.py#L44-L45

From what I can tell self._potential_fcn is set to omnigibson\tasks\point_navigation_task.py so there is likely some condition of the environment which causes it to return None

https://github.com/StanfordVL/OmniGibson/blob/cc0316a0574018a3cb2956fcbff3be75c07cdf0f/omnigibson/tasks/point_navigation_task.py#L281-L298

Full Stack Trace

[INFO] [omnigibson.utils.sim_utils] Landed object robot0 successfully!
Traceback (most recent call last):
  File "D:\repos\OmniGibson\omnigibson\envs\env_base.py", line 525, in step
    reward, done, info = self.task.step(self, action)
  File "D:\repos\OmniGibson\omnigibson\tasks\point_navigation_task.py", line 447, in step
    reward, done, info = super().step(env=env, action=action)
  File "D:\repos\OmniGibson\omnigibson\tasks\task_base.py", line 311, in step
    reward, reward_info = self._step_reward(env=env, action=action)
  File "D:\repos\OmniGibson\omnigibson\tasks\task_base.py", line 240, in _step_reward
    reward, reward_info = reward_function.step(self, env, action)
  File "D:\repos\OmniGibson\omnigibson\reward_functions\reward_function_base.py", line 50, in step
    self._reward, self._info = self._step(task=task, env=env, action=action)
  File "D:\repos\OmniGibson\omnigibson\reward_functions\potential_reward.py", line 45, in _step
    reward = (self._potential - new_potential) * self._r_potential
TypeError: unsupported operand type(s) for -: 'float' and 'NoneType'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\Users\mattm\.vscode\extensions\ms-python.debugpy-2024.4.0-win32-x64\bundled\libs\debugpy\__main__.py", line 39, in <module>
    cli.main()
  File "c:\Users\mattm\.vscode\extensions\ms-python.debugpy-2024.4.0-win32-x64\bundled\libs\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\mattm\.vscode\extensions\ms-python.debugpy-2024.4.0-win32-x64\bundled\libs\debugpy/..\debugpy\server\cli.py", line 317, in run_module
    run_module_as_main(options.target, alter_argv=True)
  File "c:\Users\mattm\.vscode\extensions\ms-python.debugpy-2024.4.0-win32-x64\bundled\libs\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 238, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\Users\mattm\.vscode\extensions\ms-python.debugpy-2024.4.0-win32-x64\bundled\libs\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "D:\repos\OmniGibson\omnigibson\examples\learning\navigation_policy_demo.py", line 187, in <module>
    main()
  File "D:\repos\OmniGibson\omnigibson\examples\learning\navigation_policy_demo.py", line 179, in main
    model.learn(
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\ppo\ppo.py", line 315, in learn
    return super().learn(
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 300, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 201, in collect_rollouts
    if not callback.on_step():
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\callbacks.py", line 114, in on_step
    return self._on_step()
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\callbacks.py", line 219, in _on_step
    continue_training = callback.on_step() and continue_training
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\callbacks.py", line 114, in on_step
    return self._on_step()
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\callbacks.py", line 460, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\evaluation.py", line 94, in evaluate_policy
    new_observations, rewards, dones, infos = env.step(actions)
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 206, in step
    return self.step_wait()
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py", line 58, in step_wait
    obs, self.buf_rews[env_idx], terminated, truncated, self.buf_infos[env_idx] = self.envs[env_idx].step(
  File "c:\Users\mattm\AppData\Local\miniconda3\envs\omnigibson\lib\site-packages\shimmy\openai_gym_compatibility.py", line 251, in step
    obs, reward, done, info = self.gym_env.step(action)
  File "D:\repos\OmniGibson\omnigibson\envs\env_base.py", line 539, in step
    raise ValueError(f"Failed to execute environment step {self._current_step} in episode {self._current_episode}")
ValueError: Failed to execute environment step 256 in episode 6
mattmazzola commented 2 months ago

The issue seems to be for geodesic reward_types

image