Ram81 / habitat-imitation-baselines

Code for training embodied agents using imitation learning at scale in Habitat-Lab
MIT License
32 stars 6 forks source link

how long does the eval script of objectnav need to run #10

Closed dwei-k closed 1 year ago

dwei-k commented 1 year ago

hi, @Ram81 thanks for your great work. after I followed the steps of this repo to start eval of onjectnav on mp3d, i got an extremly long time to finish it

I change the eval scripts:

!/bin/bash

SBATCH --job-name=onav_eval

SBATCH --gres gpu:1

SBATCH --nodes 1

SBATCH --cpus-per-task 6

SBATCH --ntasks-per-node 1

SBATCH --partition=long

SBATCH --constraint=rtx_6000

SBATCH --output=slurm_logs/eval/eval-%j.out

SBATCH --error=slurm_logs/eval/eval-%j.err

source /srv/share3/rramrakhya6/miniconda3/etc/profile.d/conda.sh

conda deactivate

conda activate habitat-3

export GLOG_minloglevel=2 export MAGNUM_LOG=quiet

MASTER_ADDR=$(srun --ntasks=1 hostname 2>&1 | tail -n1) export MASTER_ADDR

path=$1 val_dataset_path=$2 checkpoint=$3

set -x

echo "Evaluating..." echo "Hab-Sim: ${PYTHONPATH}"

python -u -m habitat_baselines.run \ --exp-config $path \ --run-type eval \ TASK_CONFIG.DATASET.DATA_PATH "$val_dataset_path/{split}/{split}.json.gz" \ TASK_CONFIG.TASK.SENSORS "['OBJECTGOAL_SENSOR', 'COMPASS_SENSOR', 'GPS_SENSOR']" \ EVAL_CKPT_PATH_DIR $checkpoint

and start it by this command: job_scripts/run_objectnav_eval.sh habitat_baselines/config/objectnav/il_objectnav.yaml data/datasets/objectnav checkpoint/objectnav_semseg.ckpt

I wonder how does it take you to run the eval scripts, and whether I did something wrong. looking forward to your reply!

me-no-money commented 1 year ago

hi! @dwei-k. I'm running the eval of onjectnav on mp3d now, but I am blocked in the following result, I don't know if it is running or entering an endless loop of the process.

I0326 19:33:12.711207 2695857 ManagedContainerBase.cpp:19] ManagedContainerBase::convertFilenameToJSON : Filename : data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH.glb changed to proposed JSON configuration filename : data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH.stage_config.json
I0326 19:33:12.711282 2695857 AbstractObjectAttributesManagerBase.h:180] File (data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH.glb) exists but is not a recognized config filename extension, so new default Stage attributes created and registered.
I0326 19:33:12.711297 2695857 Simulator.cpp:156] Loading navmesh from data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH.navmesh
I0326 19:33:12.711426 2695857 Simulator.cpp:158] Loaded.
I0326 19:33:12.711445 2695857 SceneGraph.h:93] Created DrawableGroup: 
Renderer: NVIDIA GeForce RTX 3090/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 510.47.03
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-forward-compatible-core-context
    nv-egl-incorrect-gl11-function-pointers
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
I0326 19:33:12.787256 2695857 ResourceManager.cpp:205] ResourceManager::loadStage : Loading Semantic Stage mesh : data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH_semantic.ply
I0326 19:33:12.787293 2695857 SceneGraph.h:93] Created DrawableGroup: 
I0326 19:33:14.959846 2695857 ResourceManager.cpp:237] ResourceManager::loadStage : Semantic Stage mesh : data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH_semantic.ply loaded.
I0326 19:33:14.959901 2695857 ResourceManager.cpp:1146] Importing Basis files as BC3
I0326 19:33:16.179716 2695857 simulator.py:220] Loaded navmesh data/scene_datasets/mp3d/8194nk5LbLH/8194nk5LbLH.navmesh
I0326 19:33:16.180754 2695857 simulator.py:232] Recomputing navmesh for agent's height 0.88 and radius 0.18.
W0326 19:33:16.188374 2695857 PathFinder.cpp:716] Building naavmesh before << -1 -- [9.26109,5.3914,9.69957]
W0326 19:33:16.188406 2695857 PathFinder.cpp:721] Building naavmesh  after << -1 -- [9.26109,5.3914,9.69957]
I0326 19:33:16.188418 2695857 PathFinder.cpp:407] Building navmesh with 403x274 cells
I0326 19:33:16.316104 2695857 PathFinder.cpp:675] Created navmesh with 365 vertices 183 polygons
I0326 19:33:16.316138 2695857 Simulator.cpp:767] reconstruct navmesh successful
2023-03-26 19:33:16,319 Initializing task ObjectNav-v1
2023-03-26 19:33:16,320 max object cat: 20
2023-03-26 19:33:16,321 cats: dict_values([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])

Have you ever been in a similar situation? And how long have you run the eval script?

Ram81 commented 1 year ago

Hi @dwei-k @me-no-money ,

The evaluation script runs for ~8 hours to complete evaluation on all 2k episodes. Are you still facing this issue?

dwei-k commented 1 year ago

@Ram81 thank you very much for replying. I recently found that it took a extremly long time because I set num_environments by 1

Ram81 commented 1 year ago

Got it. Increasing num environments to > 1 should resolve this issue.

Closing the issue as it is resolved.