Closed noooob-coder closed 3 months ago
This error is normal and expected and can safely be ignored. I'm guessing the run with the debug flag ran normally, and the run without the debug flag didn't finish enough rollouts during evaluation so the video was malformed.
I ignored the EGL_NOT_INITIALIZED and resolved the OpenGL error. However, when running the command python robomimic/scripts/train.py --config robomimic/exps/templates/bc.json --dataset datasets/lift/ph/low_dim_v141.hdf5 --debug, the training stops after two epochs, and the results are output to the /tmp/tmp_trained_models path. Why the training process stops after two epochs? Thank you!
============= Training Dataset =============
SequenceDataset (
path=datasets/lift/ph/low_dim_v141.hdf5
obs_keys=('object', 'robot0_eef_pos', 'robot0_eef_quat', 'robot0_gripper_qpos')
seq_length=1
filter_key=none
frame_stack=1
pad_seq_length=True
pad_frame_stack=True
goal_mode=none
cache_mode=all
num_demos=200
num_sequences=9666
)
**************************************************
Warnings generated by robomimic have been duplicated here (from above) for convenience. Please check them carefully.
ROBOMIMIC WARNING(
No private macro file found!
It is recommended to use a private macro file
To setup, run: python /home/user/robomimic/robomimic/scripts/setup_macros.py
)
**************************************************
100%|##########| 3/3 [00:00<00:00, 23.01it/s]
Train Epoch 1
{
"Cosine_Loss": 0.5587675174077352,
"L1_Loss": 0.0957380086183548,
"L2_Loss": 0.19192640483379364,
"Loss": 0.19192640483379364,
"Optimizer/policy0_lr": 0.0001,
"Policy_Grad_Norms": 0.22354567569952147,
"Time_Data_Loading": 4.790623982747396e-05,
"Time_Epoch": 0.0021804253260294597,
"Time_Log_Info": 2.5431315104166665e-06,
"Time_Process_Batch": 0.00014754931131998698,
"Time_Train_Batch": 0.001972810427347819
}
video writes to /tmp/tmp_trained_models/test/20240724152323/videos/Lift_epoch_1.mp4
rollout: env=Lift, horizon=10, use_goals=False, num_episodes=2
100%|##########| 2/2 [00:01<00:00, 1.32it/s]
Epoch 1 Rollouts took 0.7548685073852539s (avg) with results:
Env: Lift
{
"Horizon": 10.0,
"Return": 0.0,
"Success_Rate": 0.0,
"Time_Episode": 0.025162283579508463,
"time": 0.7548685073852539
}
save checkpoint to /tmp/tmp_trained_models/test/20240724152323/models/model_epoch_1_Lift_success_0.0.pth
Epoch 1 Memory Usage: 1730 MB
100%|##########| 3/3 [00:00<00:00, 357.70it/s]
Train Epoch 2
{
"Cosine_Loss": 0.5759696165720621,
"L1_Loss": 0.0873916173974673,
"L2_Loss": 0.1788101146618525,
"Loss": 0.1788101146618525,
"Optimizer/policy0_lr": 0.0001,
"Policy_Grad_Norms": 0.02232090537078572,
"Time_Data_Loading": 4.622141520182292e-05,
"Time_Epoch": 0.00014543930689493815,
"Time_Log_Info": 3.5842259724934896e-06,
"Time_Process_Batch": 7.510185241699219e-06,
"Time_Train_Batch": 8.204380671183268e-05
}
video writes to /tmp/tmp_trained_models/test/20240724152323/videos/Lift_epoch_2.mp4
rollout: env=Lift, horizon=10, use_goals=False, num_episodes=2
100%|##########| 2/2 [00:01<00:00, 1.42it/s]
Epoch 2 Rollouts took 0.7024716138839722s (avg) with results:
Env: Lift
{
"Horizon": 10.0,
"Return": 0.0,
"Success_Rate": 0.0,
"Time_Episode": 0.02341572046279907,
"time": 0.7024716138839722
}
Epoch 2 Memory Usage: 1753 MB
finished run successfully!
This is precisely what the --debug
flag is supposed to do - test a training run quickly with 2 epochs of training. To train for longer, simply omit the --debug
flag.
When I run p
ython robomimic/scripts/train.py --config robomimic/exps/templates/bc.json --dataset datasets/lift/ph/low_dim_v141.hdf5 --debug
the following error occurs. Program cannot run properly. When I delete--debug
, although it can run normally, the video inrobomimic/bc_trained_models/test/20240723101813/videos
cannot be opened normally.