Open mrFranklin opened 4 days ago
Thanks for your interest in our work! The steps you described should be correct. After DigiRL, we also did not find the agent being able to complete these tasks. There's nothing you did wrong. This is probably because the pretrained AutoUI agent was not able to correctly explore this task. You're more than encouraged to think of approaches to improve the agent.
Thanks for your interest in our work! The steps you described should be correct. After DigiRL, we also did not find the agent being able to complete these tasks. There's nothing you did wrong. This is probably because the pretrained AutoUI agent was not able to correctly explore this task. You're more than encouraged to think of approaches to improve the agent.
Thank you for replying. It your report, “AitW General Subset Success Rate” using "SFT+DigiRL + AutoUI + Offline" is 61.5%; using "SFT+DigiRL + AutoUI + Offline/Online" is 71.9%. Could I reproduce the success rate using my steps and model above? Because I have run some tasks, none of them are successful. I strongly suspect there might be issues with my steps or the model I'm using.
Hi. I have test one simple task for some time. in order to evaluate the AutoUI-DigiRL model. but the result is alway wrong. Is it normal or is there a step that is incorrect?
The main steps: (Some other less important steps have been omitted.)
modify the tast_set/general_test.txt, only leave one simple task: "Open the files app"
modify default.yaml, set bsize=1; rollout_size=1; in order to run the one task abolve.
download AutoUI base model and place it it the the path specified by
policy_lm
in default.yaml; download general-off2on-digirl.zip and unzip it to the path specified bysave_path
in eval_only.yaml. the logs below shows it is have loaded.modify the
call_gemini
function to always return 0, because I want to check the result manually and no score is required during this process.run the eval script. check the screenshots.
The model is not perform the correct actions. the "files app" is not opened. I also test the task: "Set an alarm for 6pm". Nor have the correct results.