facebookresearch / habitat-sim

A flexible, high-performance 3D simulator for Embodied AI research.
https://aihabitat.org/
MIT License
2.48k stars 406 forks source link

datasets_download script not downloading semantic data for hm3d #2328

Closed janblumenkamp closed 4 months ago

janblumenkamp commented 4 months ago

Habitat-Sim version

v0.3.0

🐛 Bug

The download script does not seem to download the semantic information for the train splits in the hm3d dataset.

Steps to Reproduce

Steps to reproduce the behavior: Run the download script as

python -m habitat_sim.utils.datasets_download --username xxx --password xxx --uids hm3d_full

The semantic data exists for minival:

> ls data/scene_datasets/hm3d/minival/00800-TEEsavR23oF/TEEsavR23oF.*
data/scene_datasets/hm3d/minival/00800-TEEsavR23oF/TEEsavR23oF.basis.glb      data/scene_datasets/hm3d/minival/00800-TEEsavR23oF/TEEsavR23oF.semantic.glb
data/scene_datasets/hm3d/minival/00800-TEEsavR23oF/TEEsavR23oF.basis.navmesh  data/scene_datasets/hm3d/minival/00800-TEEsavR23oF/TEEsavR23oF.semantic.txt
data/scene_datasets/hm3d/minival/00800-TEEsavR23oF/TEEsavR23oF.glb

And for val:

> ls data/scene_datasets/hm3d/val/00800-TEEsavR23oF/TEEsavR23oF.*
data/scene_datasets/hm3d/val/00800-TEEsavR23oF/TEEsavR23oF.basis.glb      data/scene_datasets/hm3d/val/00800-TEEsavR23oF/TEEsavR23oF.semantic.glb
data/scene_datasets/hm3d/val/00800-TEEsavR23oF/TEEsavR23oF.basis.navmesh  data/scene_datasets/hm3d/val/00800-TEEsavR23oF/TEEsavR23oF.semantic.txt
data/scene_datasets/hm3d/val/00800-TEEsavR23oF/TEEsavR23oF.glb

But apparently not for train:

> ls data/scene_datasets/hm3d/train/00000-kfPV7w3FaU5/kfPV7w3FaU5.*
data/scene_datasets/hm3d/train/00000-kfPV7w3FaU5/kfPV7w3FaU5.basis.glb      data/scene_datasets/hm3d/train/00000-kfPV7w3FaU5/kfPV7w3FaU5.glb
data/scene_datasets/hm3d/train/00000-kfPV7w3FaU5/kfPV7w3FaU5.basis.navmesh

Expected behavior

I expect the semantic data to exist for train.

Additional context

I also notice that the file data/scene_datasets/hm3d_semantic_v0.2/hm3d_basis.scene_dataset_config.json lists scene instance descriptions *.basis.scene_instance.json, but this does not exist either, which results in a lot of warnings like

[10:19:42:933663]:[Warning]:[Metadata] AttributesManagerBase.h(398)::buildAttrSrcPathsFromJSONAndLoad : <Scene Instance> : No Glob path result found for `./data/scene_datasets/hm3d_semantic_v0.2/exa
mple/00861-GLAQ4DNUx5U/*.basis.scene_instance.json` so unable to load templates from that path.                                                                                                       
[10:19:42:933852]:[Warning]:[Metadata] AttributesManagerBase.h(398)::buildAttrSrcPathsFromJSONAndLoad : <Scene Instance> : No Glob path result found for `./data/scene_datasets/hm3d_semantic_v0.2/min
ival/00800-TEEsavR23oF/*.basis.scene_instance.json` so unable to load templates from that path.            
[...]                                                                                           

Furthermore, it seems like the test set is not downloaded at all:

[10:19:42:917474]:[Warning]:[Metadata] AttributesManagerBase.h(398)::buildAttrSrcPathsFromJSONAndLoad : <Stage Template> : No Glob path result found for `./data/scene_datasets/hm3d_semantic_v0.2/tes
t/00900-XRJ2muKkwAV/*.basis.glb` so unable to load templates from that path.                                                                                                                          
[10:19:42:917636]:[Warning]:[Metadata] AttributesManagerBase.h(398)::buildAttrSrcPathsFromJSONAndLoad : <Stage Template> : No Glob path result found for `./data/scene_datasets/hm3d_semantic_v0.2/tes
t/00901-aJoC8Qw6xQ5/*.basis.glb` so unable to load templates from that path.                                                                                                                          

System Info

ENVIRONMENT INFO:
Platform: Linux-5.15.0-60-generic-x86_64-with-glibc2.35
Machine: x86_64
Processor: x86_64
Libc version: glibc 2.35
Mac version: 
Python version: 3.9.18
Architecture: 64bit ELF
Win version:    
System OS: Linux
Release: 5.15.0-60-generic
Version: #66-Ubuntu SMP Fri Jan 20 14:29:49 UTC 2023
Operational System: linux
GCC version: b'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0\n'
CMAKE version: b'cmake version 3.22.1\n'
NVIDIA-SMI: b'Thu Feb 29 10:53:46 2024       \n+-----------------------------------------------------------------------------+\n| NVIDIA-SMI 525.147.05   Driver Version: 525.147.05   CUDA Version: 12.0     |\n|-------------------------------+----------------------+----------------------+\n| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n|                               |                      |               MIG M. |\n|===============================+======================+======================|\n|   0  NVIDIA A100-PCI...  On   | 00000000:21:00.0 Off |                    0 |\n| N/A   58C    P0    74W / 250W |  25641MiB / 40960MiB |     55%      Default |\n|                               |                      |             Disabled |\n+-------------------------------+----------------------+----------------------+\n|   1  NVIDIA A100-PCI...  On   | 00000000:81:00.0 Off |                    0 |\n| N/A   55C    P0    98W / 250W |  24479MiB / 40960MiB |     75%      Default |\n|                               |                      |             Disabled |\n+-------------------------------+----------------------+----------------------+\n'
Pip packages versions:
b''
Conda packages versions:
b''
aclegg3 commented 4 months ago

Hey @janblumenkamp,

Thanks for reaching out, a couple things here:

  1. HM3D semantic annotations do not cover the full set of 1000 scenes. If you downloaded the semantic dataset, there should be config files limiting the dataset to those scenes which have annotations: hm3d-train-semantic-configs-v0.2/hm3d_annotated_train_basis.scene_dataset_config.json. The scene you referenced kfPV7w3FaU5 is not in the list. Check one of these to validate that annotations were downloaded correctly.
  2. test split assets are not available to public. We keep them in reserve such that future challenges can be run without biasing results by purposefully or accidentaly training or evaluating on them. Instead, do your own evaluation using the val set.
janblumenkamp commented 4 months ago

Hey Alexander, thanks a lot for the quick response and clarifications!

Indeed, the semantic data is available for the scenes listed in that json file. I assume it's fine for me just to ignore these warnings then.