facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.89k stars 475 forks source link

allow cpu usage #171

Closed grahamannett closed 2 years ago

grahamannett commented 5 years ago

while some of the library is configured to allow cpu usage, it seems like not all parts are.

if trying to use habitat_basemap_location=self.device, I don't have a gpu to use but am using habitat-api with another RL library, is it possible that there could be an argument or some way to allow non GPU havers to still eval the models? For instance https://github.com/facebookresearch/habitat-api/blob/master/habitat_baselines/rl/ppo/ppo_trainer.py#L339 makes it so you have to hardcode "cpu" (theres a few other places as well)

erikwijmans commented 5 years ago

It should work if the instances of torch.device("cuda", ppo_cfg.pth_gpu_id)(or similar) are changed to torch.device("cuda", ppo_cfg.pth_gpu_id) if torch.cuda.is_available() else torch.device("cpu")

grahamannett commented 5 years ago

@erikwijmans theres some other issues I think as well in PPOTrainer related to using a config that doesn't match up with values from the config and other hard coded values as well

erikwijmans commented 5 years ago

@JasonJiazhiZhang Can you take a look at what is pointed out above?

JasonJiazhiZhang commented 5 years ago

@erikwijmans theres some other issues I think as well in PPOTrainer related to using a config that doesn't match up with values from the config and other hard coded values as well

Hi @grahamannett, currently the PPOTrainer tries to load saved config from checkpoints. As the config system has been changed recently, older checkpoints are expected to have KeyError when getting merged here. For now, replacing the linked code block with this should work:

config.merge_from_list(eval_cmd_opts)

The config system is still undergoing some changes, I will add the backward compatibility check along with those changes. Let me know if you still have any issue loading configs.

grahamannett commented 5 years ago

@JasonJiazhiZhang I think the issue I saw wasn't related to checkpoints, but ill look again.

Also for the PPO algorithm I am still learning this stuff and trying to follow the PPO from baselines but I think there may be something wrong with the implementation compared to the one from https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail is it pretty certain that it is correctly implemented in this baseline?

JasonJiazhiZhang commented 5 years ago

@grahamannett The PPO algorithm in baselines is for pointgoal navigation and should be correct. Let me know if you need help when you go back and have identified the specific issue :)

grahamannett commented 5 years ago

@JasonJiazhiZhang @erikwijmans Im possibly wrong but it seems like there is something wrong with rollout storage/minibatches/processes compared to the implementation it says its copied from. For instance https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/main.py#L102 they are using a num_updates that incorporates num_steps, number of sub steps and num processes, versus the habitat-api implementation https://github.com/facebookresearch/habitat-api/blob/master/habitat_baselines/rl/ppo/ppo_trainer.py#L173 which is from the config and doesn't incorporate that information but still utilizes aspects of those params within the for loops

Am I wrong on this? The implementation in habitat-api is a bit more complex than the other so I it's very possible I am wrong here. If I'm right about that then theres a few other things with the PPO/storage that I think are implemented incorrectly. I am using habitat-api with this library https://github.com/danaugrs/huskarl/ and am trying to implement PPO in that and was trying to find a way to compare results but unsure

For the other issue though related to checkpoints I dont think that is what I meant, Im more referring to stuff like (as well as a few other areas) https://github.com/facebookresearch/habitat-api/blob/master/habitat_baselines/rl/ppo/ppo_trainer.py#L67 and https://github.com/facebookresearch/habitat-api/blob/master/habitat_baselines/rl/ppo/ppo_trainer.py#L148 where if you are using PPO.hidden_size from the config and if the Policy hidden_size doesn't match the RolloutStorage hidden_size, it will cause an issue.

Let me know if I am wrong with that because if so I think there is a bigger issue but still a bit confused and trying to understand

JasonJiazhiZhang commented 5 years ago

@grahamannett Regarding hidden size: it should be ppo_cfg.hidden_size instead of 512. Working on a config system update PR that should fix this. Thanks for catching this. You can open up a separate issue if you'd like :)

grahamannett commented 5 years ago

I think ideally the rolloutstorage will be based on the policy so you can modify/extend the base policy but maybe that's what you meant.

Also any idea on the train loop with for update in range(ppo_cfg.num_updates): ? Are the numbers from this: https://arxiv.org/pdf/1904.01201.pdf based on this implementation?

erikwijmans commented 5 years ago

The for update in range(ppo_cfg.num_updates): is correct for the way we've been doing things. Specifying how long to run for as a number of updates or as a number of frames are both fine ways to do things.

For the paper, we did use for update in range(ppo_cfg.num_updates): to train, but then converted update number to number of steps by doing update * num_envs * num_steps.

grahamannett commented 5 years ago

@erikwijmans okay thanks, I was under the impression that it was updating more frequently than it should which made me unsure if the results comparing a baseline for PPO would be a fair comparison against another type of agent. but Ill take your word for it

erikwijmans commented 5 years ago

The for update in range(ppo_cfg.num_updates): defines the number of updates to the model (how many times you collect a rollout and then update the model based on that rollout), not how often the model is updated. How often the model is updated is determined by num_steps, like here: https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail/blob/master/main.py#L111

grahamannett commented 5 years ago

ah okay thanks @erikwijmans , sorry for derailing the issue

rpartsey commented 2 years ago

All the questions were answered in the Issue comments. Closing as resolved.