Issue with Evaluation - Githubissues

GoooKuuu commented 1 month ago

Hi, thanks for your fantastic work. I'm trying to replicate it by running python run.py --config-path config/main --config-name eval_only The model loading part seems to go smoothly, showing messages like:

>>> Using DigiRL trainer
>>> Loading from previous checkpoint
[2024-09-25 06:30:19,513][accelerate.accelerator][INFO] - Loading states from /home/clouduser/DigiRL/digirl/data/ckpts/general-off2on-digirl/trainer.pt
[2024-09-25 06:30:21,112][accelerate.checkpointing][INFO] - All model weights loaded successfully
[2024-09-25 06:30:21,113][accelerate.checkpointing][INFO] - All optimizer states loaded successfully
[2024-09-25 06:30:21,113][accelerate.checkpointing][INFO] - All scheduler states loaded successfully
[2024-09-25 06:30:21,113][accelerate.checkpointing][INFO] - All dataloader sampler states loaded successfully
[2024-09-25 06:30:21,117][accelerate.checkpointing][INFO] - Could not load random states
[2024-09-25 06:30:21,133][accelerate.accelerator][INFO] - Loading in 0 custom states
>>> Evaluating Agent

However, I hit a couple of strange errors during evaluation. The first one is:

Failed to reset the emulator: [('/home/clouduser/.android/avd/test_Android.avd/userdata-qemu.img', '/home/clouduser/.android/avd/test6.avd/userdata-qemu.img', "[Errno None] None: '/home/clouduser/.android/avd/test_Android.avd/userdata-qemu.img' -> '/home/clouduser/.android/avd/test6.avd/userdata-qemu.img'")]
Traceback (most recent call last):
  File "/home/clouduser/DigiRL/digirl/digirl/environment/android/env.py", line 453, in reset
    clone_avd(self.avd_name, cache_avd_name, self.android_avd_home)
  File "/home/clouduser/DigiRL/digirl/digirl/environment/android/env.py", line 72, in clone_avd
    shutil.copytree(src_avd_dir, tar_avd_dir)
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/shutil.py", line 556, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/shutil.py", line 512, in _copytree
    raise Error(errors)
shutil.Error: [('/home/clouduser/.android/avd/test_Android.avd/userdata-qemu.img', '/home/clouduser/.android/avd/test6.avd/userdata-qemu.img', "[Errno None] None: '/home/clouduser/.android/avd/test_Android.avd/userdata-qemu.img' -> '/home/clouduser/.android/avd/test6.avd/userdata-qemu.img'")]

I have verified that the specified file exists and there are no permission issues. Additionally, I tried deleting the AVD and reconfiguring it, but the problem persists.

The second one is:

Traceback (most recent call last):
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/tenacity/__init__.py", line 478, in __call__
    result = fn(*args, **kwargs)
  File "/home/clouduser/DigiRL/digirl/digirl/environment/android/evaluate.py", line 176, in call_gemini
    response = client.generate_content(
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/generativeai/generative_models.py", line 331, in generate_content
    response = self._client.generate_content(
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/ai/generativelanguage_v1beta/services/generative_service/client.py", line 827, in generate_content
    response = rpc(
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 293, in retry_wrapped_func
    return retry_target(
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 153, in retry_target
    _retry_error_helper(
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/retry/retry_base.py", line 212, in _retry_error_helper
    raise final_exc from source_exc
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/retry/retry_unary.py", line 144, in retry_target
    result = target()
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/timeout.py", line 120, in func_with_timeout
    return func(*args, **kwargs)
  File "/home/clouduser/miniconda3/envs/digirl/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ResourceExhausted: 429 Resource has been exhausted (e.g. check quota).

I'd really appreciate any guidance on how to tackle these issues. Thanks a ton for your help!

GoooKuuu commented 1 month ago

To make it easier to diagnose the problem, I attached the log. output.log

BiEchi commented 1 month ago

Thanks for your interest in our work.

Basically this error does not cause any trouble, so you can safely ignore it.
try adding a sleep within the call_gemini() function.

Both were mentioned in #18. I'll push a patch soon.

GoooKuuu commented 1 month ago

Thanks for the timely reply, I'll try again as you suggested.

GoooKuuu commented 1 month ago

Following your suggestion, I added a sleep within the call_gemini() function. I tried sleep(2), sleep(4), and sleep(8), but I'm still getting the 409 resource exhausted error. output.log

Do you have any other recommendations? Perhaps I could try switching the Gemini model or upgrading my free account?

BiEchi commented 1 month ago

Thanks for getting back on this!

I believe that Gemini prevents you from even a single request after you reach the limit, so it would be nice to wait for ~12h and try again.
Would you please check your RPM of Gemini-1.5-pro? According to discussions in #18, if you've got RPM=2, it's likely that you can run the program with sleep(2).
If this is too slow, you can consider moving to other services like Claude etc., we've made it very easy to change (you just need to change the API-level code in evaluate.py).

BiEchi commented 1 month ago

Closed due to inactivity.

BiEchi commented 1 month ago

Follow-up question by @GoooKuuu:

Thank you for your help, and apologies for the delayed response. I've successfully managed to run the evaluation script, setting rollout_size (16) * eval_iterations (6). However, when the process finished, I noticed that the log didn't include any statistics regarding the success rate. Did I overlook something? 1.png (view on web)

Answer: the rollout.mean item is success rate.

surgan12 commented 1 month ago

I wanted to check why is "autoui_prepare_prompt" only using the history of size 1?

BiEchi commented 1 month ago

@surgan12 It's generally not advised to follow up any dangling questions in a thread, as this is very likely to be ignored. Rather, please open a new issue on your problem.

Would you please open a new issue and elaborate on which line of code you're referring to?

DigiRL-agent / digirl

Issue with Evaluation #19