Closed yipclam closed 2 months ago
Details are as follows.
(dg) yzl@autodl-container-06ff47b7fc-1ea0e1e3:~/dg/digirl/scripts$ python run.py --config-path config/main --config-name eval_only task_set: webshop task_split: test eval_sample_mode: sequential max_steps: 20 huggingface_token: xx wandb_key: '' gemini_key: xx policy_lm: /root/autodl-tmp/yzl/dg/Auto-UI-Base critic_lm: roberta-base capacity: 2000 epochs: 5 batch_size: 8 bsize: 4 rollout_size: 16 grad_accum_steps: 32 warmup_iter: 0 actor_epochs: 20 trajectory_critic_epochs: 5 lm_lr: 0.0001 critic_lr: 0.0001 max_grad_norm: 0.01 gamma: 0.5 use_lora: false agent_name: autoui do_sample: true temperature: 1.0 tau: 0.01 max_new_tokens: 128 record: false use_wandb: false entity_name: '' project_name: '' android_avd_home: /root/autodl-tmp/yzl/.android/avd emulator_path: /root/autodl-tmp/yzl/.android/emulator/emulator adb_path: /root/autodl-tmp/yzl/.android/platform-tools/adb cache_dir: /root/autodl-tmp/yzl/.cache assets_path: /root/autodl-tmp/yzl/dg/digirl/digirl/environment/android/assets/task_set save_path: /root/autodl-tmp/yzl/logs/ckpts/webshop-off2on-digirl/ run_name: autoui-general-eval-only train_algorithm: digirl task_mode: evaluate parallel: single eval_iterations: 6 save_freq: 3
The token has not been saved to the git credentials helper. Pass add_to_git_credential=True
in this function directly or --add-to-git-credential
if using via huggingface-cli
if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /root/autodl-tmp/yzl/.cache/huggingface/token
Login successful
Agent: autoui Evauation mode /root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage() return self.fget.get(instance, owner)() /root/autodl-tmp/yzl/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. warnings.warn( Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:08<00:00, 4.04s/it] starting appium server at port 6652 starting appium server at port 6653 starting appium server at port 6654 starting appium server at port 6655 Using DigiRL trainer Loading from previous checkpoint [2024-09-09 19:15:18,719][accelerate.accelerator][INFO] - Loading states from /root/autodl-tmp/yzl/logs/ckpts/webshop-off2on-digirl/trainer.pt [2024-09-09 19:15:20,319][accelerate.checkpointing][INFO] - All model weights loaded successfully [2024-09-09 19:15:20,319][accelerate.checkpointing][INFO] - All optimizer states loaded successfully [2024-09-09 19:15:20,319][accelerate.checkpointing][INFO] - All scheduler states loaded successfully [2024-09-09 19:15:20,320][accelerate.checkpointing][INFO] - All dataloader sampler states loaded successfully [2024-09-09 19:15:20,321][accelerate.checkpointing][INFO] - Could not load random states Error executing job with overrides: [] Traceback (most recent call last): File "/root/autodl-tmp/yzl/dg/digirl/scripts/run.py", line 120, in main eval_loop(env = env, File "/root/autodl-tmp/yzl/dg/digirl/digirl/algorithms/eval_loop.py", line 61, in eval_loop trainer.load(os.path.join(save_path, 'trainer.pt')) File "/root/autodl-tmp/yzl/dg/digirl/digirl/algorithms/digirl/trainer.py", line 305, in load self.accelerator.load_state(path) File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/accelerator.py", line 3156, in load_state self.step = override_attributes["step"] KeyError: 'step'
accelerate
?I tried v0.31 and v0.33 before and the latest version v0.34 released last week works. Thanks for the help!
Hi, I try to reproduce scores in your paper with your final checkpoint then get the following error.
[2024-09-09 19:41:34,814][accelerate.checkpointing][INFO] - All model weights loaded successfully [2024-09-09 19:41:34,814][accelerate.checkpointing][INFO] - All optimizer states loaded successfully [2024-09-09 19:41:34,814][accelerate.checkpointing][INFO] - All scheduler states loaded successfully [2024-09-09 19:41:34,814][accelerate.checkpointing][INFO] - All dataloader sampler states loaded successfully [2024-09-09 19:41:34,816][accelerate.checkpointing][INFO] - Could not load random states Error executing job with overrides: [] Traceback (most recent call last): File "/root/autodl-tmp/yzl/dg/digirl/scripts/run.py", line 120, in main eval_loop(env = env, File "/root/autodl-tmp/yzl/dg/digirl/digirl/algorithms/eval_loop.py", line 61, in eval_loop trainer.load(os.path.join(save_path, 'trainer.pt')) File "/root/autodl-tmp/yzl/dg/digirl/digirl/algorithms/filteredbc/trainer.py", line 78, in load self.accelerator.load_state(path) File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/accelerator.py", line 3156, in load_state self.step = override_attributes["step"] KeyError: 'step'
I guess it's the accelerate version problem because there is a similar problem in https://github.com/huggingface/accelerate/issues/3067. But downgrading to v0.31 didnt make it work and I get another error.
Traceback (most recent call last): File "/root/autodl-tmp/yzl/dg/digirl/scripts/run.py", line 3, in
from digirl.environment import BatchedAndroidEnv
File "/root/autodl-tmp/yzl/dg/digirl/digirl/environment/init.py", line 1, in
from .env_utils import batch_interact_environment
File "/root/autodl-tmp/yzl/dg/digirl/digirl/environment/env_utils.py", line 4, in
import accelerate
File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/init.py", line 16, in
from .accelerator import Accelerator
File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/accelerator.py", line 35, in
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/checkpointing.py", line 24, in
from .utils import (
File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/utils/init.py", line 183, in
from .fsdp_utils import load_fsdp_model, load_fsdp_optimizer, merge_fsdp_weights, save_fsdp_model, save_fsdp_optimizer
File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/accelerate/utils/fsdp_utils.py", line 36, in
import torch.distributed.checkpoint.format_utils as dist_cp_format_utils
File "/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/torch/distributed/checkpoint/format_utils.py", line 12, in
from torch.distributed.checkpoint.default_planner import (
ImportError: cannot import name '_EmptyStateDictLoadPlanner' from 'torch.distributed.checkpoint.default_planner' (/root/autodl-tmp/yzl/.conda/envs/dg/lib/python3.10/site-packages/torch/distributed/checkpoint/default_planner.py)