facebookresearch / habitat-lab

A modular high-level library to train embodied AI agents across a variety of tasks and environments.
https://aihabitat.org/
MIT License
1.94k stars 483 forks source link

Unexpected training issues: AttributeError: 'NoneType' object has no attribute 'BytesIO' #778

Open umerhasan17 opened 2 years ago

umerhasan17 commented 2 years ago

Habitat-Lab and Habitat-Sim versions

Habitat-Lab: v0.1.5

Habitat-Sim: v0.1.6

❓ Questions and Help

When I am training experiments on habitat lab, I often get unexpected breakages during training (at seemingly random points throughout the training). The error is pasted below. I believe the reason for this might be insufficient resources since

However, I expect there could be other reasons. I would be grateful to receive any ideas for potential solutions.

I really should not be constrained by resources since I am using a google cloud VM instance with 4 CPUs and V100 GPU.

Project code is here: https://github.com/umerhasan17/deep-robust-robotnav

021-12-17 20:16:11,003 Average window size: 50  reward: -1.351  distance_to_goal: 4.599  success: 0.200  spl: 0.093
2021-12-17 20:16:13,483 update: 1631    fps: 45.656
2021-12-17 20:16:13,484 update: 1631    env-time: 4004.469s     pth-time: 522.877s      frames: 208896
2021-12-17 20:16:13,484 Average window size: 50  reward: -1.114  distance_to_goal: 4.439  success: 0.214  spl: 0.099
I1217 20:16:23.147665  4220 PhysicsManager.cpp:33] Deconstructing PhysicsManager
I1217 20:16:23.147722  4220 SemanticScene.h:41] Deconstructing SemanticScene
I1217 20:16:23.147727  4220 SceneManager.h:25] Deconstructing SceneManager
I1217 20:16:23.147730  4220 SceneGraph.h:26] Deconstructing SceneGraph
I1217 20:16:23.148375  4220 Sensor.h:81] Deconstructing Sensor
I1217 20:16:23.148473  4220 Sensor.h:81] Deconstructing Sensor
I1217 20:16:23.148748  4220 Renderer.cpp:34] Deconstructing Renderer
I1217 20:16:23.148772  4220 WindowlessContext.h:17] Deconstructing WindowlessContext
I1217 20:16:23.164949  4220 AssetAttributesManager.cpp:122] Asset attributes (capsule3DSolid : capsule3DSolid_hemiRings_4_cylRings_1_segments_12_halfLen_0.
75_useTexCoords_false_useTangents_false) created and registered.
I1217 20:16:23.165030  4220 AssetAttributesManager.cpp:122] Asset attributes (capsule3DWireframe : capsule3DWireframe_hemiRings_8_cylRings_1_segments_16_ha
lfLen_1) created and registered.
I1217 20:16:23.165112  4220 AssetAttributesManager.cpp:122] Asset attributes (coneSolid : coneSolid_segments_12_halfLen_1.25_rings_1_useTexCoords_false_use
Tangents_false_capEnd_true) created and registered.
I1217 20:16:23.165154  4220 AssetAttributesManager.cpp:122] Asset attributes (coneWireframe : coneWireframe_segments_32_halfLen_1.25) created and registere
d.
I1217 20:16:23.165179  4220 AssetAttributesManager.cpp:122] Asset attributes (cubeSolid : cubeSolid) created and registered.
I1217 20:16:23.165199  4220 AssetAttributesManager.cpp:122] Asset attributes (cubeWireframe : cubeWireframe) created and registered.
I1217 20:16:23.165256  4220 AssetAttributesManager.cpp:122] Asset attributes (cylinderSolid : cylinderSolid_rings_1_segments_12_halfLen_1_useTexCoords_fals
e_useTangents_false_capEnds_true) created and registered.
I1217 20:16:23.165328  4220 AssetAttributesManager.cpp:122] Asset attributes (cylinderWireframe : cylinderWireframe_rings_1_segments_32_halfLen_1) created
and registered.
I1217 20:16:23.165369  4220 AssetAttributesManager.cpp:122] Asset attributes (icosphereSolid : icosphereSolid_subdivs_1) created and registered.
I1217 20:16:23.165395  4220 AssetAttributesManager.cpp:122] Asset attributes (icosphereWireframe : icosphereWireframe_subdivs_1) created and registered.
I1217 20:16:23.165444  4220 AssetAttributesManager.cpp:122] Asset attributes (uvSphereSolid : uvSphereSolid_rings_8_segments_16_useTexCoords_false_useTange
nts_false) created and registered.
I1217 20:16:23.165469  4220 AssetAttributesManager.cpp:122] Asset attributes (uvSphereWireframe : uvSphereWireframe_rings_16_segments_32) created and regis
tered.
I1217 20:16:23.165486  4220 AssetAttributesManager.cpp:108] AssetAttributesManager::buildCtorFuncPtrMaps : Built default primitive asset templates : 12
I1217 20:16:23.166215  4220 PhysicsAttributesManager.cpp:39] File (./data/default.phys_scene_config.json) not found so new, default physics manager attribu
tes created and registered.
I1217 20:16:23.166337  4220 StageAttributesManager.cpp:79] File (data/scene_datasets/gibson/Micanopy.glb) Based stage attributes created and registered.
I1217 20:16:23.166355  4220 Simulator.cpp:145] Loading navmesh from data/scene_datasets/gibson/Micanopy.navmesh
I1217 20:16:23.167831  4220 Simulator.cpp:147] Loaded.
I1217 20:16:23.167856  4220 SceneGraph.h:93] Created DrawableGroup:
Renderer: Tesla V100-SXM2-16GB/PCIe/SSE2 by NVIDIA Corporation
OpenGL version: 4.6.0 NVIDIA 460.73.01
Using optional features:
    GL_ARB_ES2_compatibility
    GL_ARB_direct_state_access
    GL_ARB_get_texture_sub_image
    GL_ARB_invalidate_subdata
    GL_ARB_multi_bind
    GL_ARB_robustness
    GL_ARB_separate_shader_objects
    GL_ARB_texture_filter_anisotropic
    GL_ARB_texture_storage
    GL_ARB_texture_storage_multisample
    GL_ARB_vertex_array_object
    GL_KHR_debug
Using driver workarounds:
    no-forward-compatible-core-context
    nv-egl-incorrect-gl11-function-pointers
    no-layout-qualifiers-on-old-glsl
    nv-zero-context-profile-mask
    nv-implementation-color-read-format-dsa-broken
    nv-cubemap-inconsistent-compressed-image-size
    nv-cubemap-broken-full-compressed-image-query
    nv-compressed-block-size-in-bits
I1217 20:16:23.252456  4220 ResourceManager.cpp:934] Importing Basis files as BC7
Traceback (most recent call last):
  File "run.py", line 78, in <module>
    main()
  File "run.py", line 42, in main
    run_exp(**vars(args))
  File "run.py", line 68, in run_exp
    trainer.train()
  File "/home/umerhasan17/deep-robust-robotnav/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 365, in train
    rollouts, current_episode_reward, running_episode_stats
  File "/home/umerhasan17/deep-robust-robotnav/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 195, in _collect_rollout_step
    outputs = self.envs.step([a[0].item() for a in actions])
  File "/home/umerhasan17/deep-robust-robotnav/habitat-api/habitat/core/vector_env.py", line 414, in step
    return self.wait_step()
  File "/home/umerhasan17/deep-robust-robotnav/habitat-api/habitat/core/vector_env.py", line 400, in wait_step
    observations.append(read_fn())
  File "/opt/conda/envs/vis5/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/conda/envs/vis5/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/conda/envs/vis5/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7fd8b6ddaf60>>
Traceback (most recent call last):
  File "/home/umerhasan17/deep-robust-robotnav/habitat-api/habitat/core/vector_env.py", line 543, in __del__
  File "/home/umerhasan17/deep-robust-robotnav/habitat-api/habitat/core/vector_env.py", line 422, in close
  File "/opt/conda/envs/vis5/lib/python3.6/multiprocessing/connection.py", line 250, in recv
  File "/opt/conda/envs/vis5/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
  File "/opt/conda/envs/vis5/lib/python3.6/multiprocessing/connection.py", line 375, in _recv
AttributeError: 'NoneType' object has no attribute 'BytesIO'
Reschivon commented 2 years ago

Hello, did you find a solution to this issue?

Reschivon commented 2 years ago

For reference, I fixed this by upgrading python to 3.7 from 3.6