facebookresearch / habitat-challenge

Code for the habitat challenge
https://aihabitat.org
MIT License
307 stars 56 forks source link

distance_to_goal calculation results in infinity, pitch control is not working effectively in continuous action space #165

Open april-zmx opened 1 year ago

april-zmx commented 1 year ago

Hello, I have a few questions to consult:

  1. I am currently using Habitat-Sim Challenge 2023, Habitat-Lab 2023, and Python version 3.8. When training with the hm3d_v0.2 dataset, I often encounter situations where distance_to_goal, spl, and softspl is nan. After careful investigation, I found that many episodes start with infinite shortest paths at the beginning. I also encountered this issue on the hm3d_v0.1 dataset, but not on the versions of code used in Habitat-Sim 0.2.3 (Python version 3.7) in 2022. How can I solve this problem?

  2. https://github.com/facebookresearch/habitat-lab/blob/challenge-2023/habitat-lab/habitat/tasks/nav/nav.py In VelocityAction, the key values defined in the action space for camera_velocity_pitch and camera_angular_velocity_pitch passed into step is different, which causes the pitch angular velocity to be unavailable in continuous action spaces. Is this a bug? Will this affect the test results during actual submissions?

ykarmesh commented 1 year ago

Hey @april-zmx,

I am currently using Habitat-Sim Challenge 2023, Habitat-Lab 2023, and Python version 3.8. When training with the hm3d_v0.2 dataset, I often encounter situations where distance_to_goal, spl, and softspl is nan. After careful investigation, I found that many episodes start with infinite shortest paths at the beginning. I also encountered this issue on the hm3d_v0.1 dataset, but not on the versions of code used in Habitat-Sim 0.2.3 (Python version 3.7) in 2022. How can I solve this problem?

Sorry you are facing this issue. It is great you have found out a way to quickly catch it for fast iteration. To help you to solve this issue, I will need some more information:

a) Which config are you using for training? b) The dataset that you mention (hm3d_v0.2) is the scene dataset. What episode dataset are you using with it?

april-zmx commented 1 year ago

a) I am currently using the settings for the 2023 challenge, and I have recently found that the issue of changing the height of the t agent from 1.41m to 0.88m in 2022 has basically disappeared. If I use the 1.41m setting, even in the 2022 code, I will encounter distance_to_goal calculated by the pathfinder is a positive infinite problem.

b) I encountered this issue at the beginning of the training and tried to skip these episodes directly, but I found that there are many episodes that have encountered this problem.

In addition, the official baseline model in 2022 can achieve a success rate of over 90% on the training set of hm3dv0.1. However, In the latest official code for 2023, I loaded the baseline model with the 2022 settings and verified the rate of success on the training set, and the success rate was indeed 0%. I also downloaded the 2023 baseline model test you provided https://github.com/facebookresearch/habitat-challenge/issues/163 which is a discrete action space, not a continuous action space, And its success rate on the training sets of hm3dv0.1 and hm3dv0.2 is close to 0%. I have checked the code but cannot find the reason. I hope you can give some suggestions. thank you.

ykarmesh commented 1 year ago

We changed the agent's configuration and the agent's camera parameter in the 2023 config to match the parameters of the Stretch from Hello-Robot. Every time the configuration of the agent change, we need to create new episodes for it. This is because new configuration might make part of the scene inaccessible to the agent making some of the episodes infeasible. When an episode is infeasible, there is no path between the start and goal pose for that episode and the distance to goal passes NaNs, which is what you are seeing when you try to use the 2023 config with the 2022 episode dataset.

One of the reason you are still encountering NaNs when you try to use the 2023 config with the 2023 dataset could be because of a bug I found in our configs. I recently pushed a fix for this here. Let me know if this solves it for you?

ykarmesh commented 1 year ago

I loaded the baseline model with the 2022 settings and verified the rate of success on the training set, and the success rate was indeed 0%

Just to confirm, are you saying that you loaded the 2022 baseline model with the 2022 agent config in the 2023 codebase and its achieving 0% SR?

hm3dv0.2 is close to 0%

I suggest that you try to reproduce the steps in the README. I am able to get 10% SR on the minival set with the provided baseline checkpoint.

april-zmx commented 1 year ago

One of the reason you are still encountering NaNs when you try to use the 2023 config with the 2023 dataset could be because of a bug I found in our configs. I recently pushed a fix for this here. Let me know if this solves it for you?

Thank you! This has indeed solved the problem I encountered with Nan.

Just to confirm, are you saying that you loaded the 2022 baseline model with the 2022 agent config in the 2023 codebase and its achieving 0% SR?

Yes, I loaded the baseline model for 2022 using the code from 2023 and aligned it with the settings from 2022. However, on the training set of hm3dv0.1, there is a significant difference in SR between 2022 and 2023.

I suggest that you try to reproduce the steps in the README. I am able to get 10% SR on the minival set with the provided baseline checkpoint.

Thank you! I will verify the results at the minival set. Can you provide more information about the baseline models trained on the training set and validation set in 2023, such as spl, softspl, and SR?