facebookresearch / habitat-challenge

Code for the habitat challenge
https://aihabitat.org
MIT License
308 stars 56 forks source link

Can not evaluate checkpoint from the baseline: ddppo_objectnav_habitat2022_challenge_baseline_v1.pth #130

Closed anavuongdin closed 2 years ago

anavuongdin commented 2 years ago

Hi, I'm running the evaluating script for the baseline in the section 6. of https://github.com/facebookresearch/habitat-challenge#objectnav-baselines-and-dd-ppo-training-starter-code. However, I encountered the error as: File "/.../policy.py", line 73, in act observations, rnn_hidden_states, prev_actions, masks File "/.../module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/.../rl/ddppo/policy/resnet_policy.py", line 469, in forward out = torch.cat(x, dim=1) RuntimeError: Sizes of tensors must match except in dimension 0. Got 4 and 8 (The offending index is 0)

Could you please help me? Thank in advance!

anavuongdin commented 2 years ago

Hi authors, I'm currently trying to implement some agents other than DD-PPO. I tried to read the code in ppo_agents.py and modify, however, it would take greater efforts to transfer from your code to my intention than implementing from scratch. It would be really nice if you could give me some starter code for training some really simple agents, for example, DD-PPO with only one hidden layer or something like that. Many thanks in advance!

dhruvbatra commented 2 years ago

Take a look at the blind policy here: that's just an LSTM: https://github.com/facebookresearch/habitat-lab/tree/main/habitat_baselines/rl/ddppo

Implementing a memory-less policy is doable but unlikely to be a high-priority item for us.

anavuongdin commented 2 years ago

Hi authors, the Object-Navigation challenge might be over, however, I'm still interesting in this challenge. I have run the code with slurm as you mentioned in README: srun python -u -m habitat_baselines.run \ --exp-config ../habitat-challenge/configs/ddppo_objectnav.yaml \ --run-type train \ BASE_TASK_CONFIG_PATH ../habitat-challenge/configs/challenge_objectnav2022.local.rgbd.yaml \ TASK_CONFIG.DATASET.DATA_PATH ../habitat-challenge/habitat-challenge-data/objectgoal_hm3d/{split}/{split}.json.gz \ TASK_CONFIG.DATASET.SCENES_DIR ../habitat-challenge/habitat-challenge-data/data/scene_datasets/ \ TASK_CONFIG.DATASET.SPLIT 'train' \ TENSORBOARD_DIR ./tb \ CHECKPOINT_FOLDER ./checkpoints \ LOG_FILE ./train.log

However, the code was terminated with the following errors: [14:27:03:896882]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00853-5cdEh9F2hJL/*.basis.scene_instance.json

[14:27:03:897210]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00873-bxsVRursffK/*.basis.scene_instance.json

[14:27:03:897538]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00876-mv2HUxq3B53/*.basis.scene_instance.json

[14:27:03:897870]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00877-4ok3usBNeis/*.basis.scene _instance.json

[14:27:03:898197]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00878-XB4GS9ShBRE/*.basis.scene _instance.json

[14:27:03:898526]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00880-Nfvxx8J5NCo/*.basis.scene_instance.json

[14:27:03:898858]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00890-6s7QHgap2fW/*.basis.scene_instance.json

[14:27:03:899186]:[Metadata] AttributesManagerBase.h(357)::buildAttrSrcPathsFromJSONAndLoad : No Glob path result for ./data/scene_datasets/hm3d/val/00891-cvZr5TUy5C5/*.basis.scene _instance.json

[14:27:03:899216]:[Metadata] AttributesManagerBase.h(360)::buildAttrSrcPathsFromJSONAndLoad : <Scene Instance>:123paths specified in JSON doc forScene Instancetemplates.

[14:27:03:899250]:[Metadata] SceneDatasetAttributesManager.cpp(35)::createObject : JSON Configuration File (./data/scene_datasets/hm3d/hm3d_annotated_basis.scene_dataset_config.json) based dataset attributes created and registered.

[14:27:03:899268]:[Metadata] MetadataMediator.cpp(120)::createSceneDataset : Dataset ./data/scene_datasets/hm3d/hm3d_annotated_basis.scene_dataset_config.json successfully created.

[14:27:03:899288]:[Metadata] MetadataMediator.cpp(217)::setActiveSceneDatasetName : Attempt to create new dataset ./data/scene_datasets/hm3d/hm3d_annotated_basis.scene_dataset_config.json succeeded. Currently active dataset : ./data/scene_datasets/hm3d/hm3d_annotated_ basis.scene_dataset_config.json

[14:27:03:899307]:[Metadata] MetadataMediator.cpp(175)::setCurrPhysicsAttributesHandle : Old physics manager attributes changed to ./data/default.physics_config.json successfully.

[14:27:03:899333]:[Metadata] MetadataMediator.cpp(66)::setSimulatorConfiguration : Set new simulator config for scene/stage : ../../.../data/scene_datasets/hm3d/train/00538-3CBBjsNkhqW/3CBBjsNkhqW.basis.glb and dataset : ./data/scene_datasets/hm3d/hm3d_annotated_basis.scene_dataset_config.json which is currently active dataset.

Exception ignored in: <function VectorEnv.__del__ at 0x7fff4496b5f0> Traceback (most recent call last): File "/.../.../habitat-lab/habitat/core/vector_env.py", line 592, in __del__ self.close() File "/.../.../habitat-lab/habitat/core/vector_env.py", line 460, in close read_fn() File "/.../...t/habitat-lab/habitat/core/vector_env.py", line 97, in __call__ res = self.read_fn() File "/.../.../habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 67, in recv buf = self.recv_bytes() File "/.../habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/.../habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes buf = self._recv(4) File "/.../lib/python3.7/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError: srun: error: gpu013: task 0: Exited with exit code 1 Can you please tell me which possible error this could be? Because the exception was ignored and no really helpful messages provided. P/S: I have quite experiences running these codes, I have managed to run them properly on many single-node devices, however, this was my first try towards multi-node devices.