Several problems noticed

I followed the webarena instructions. The first thing I notice is the gitlab docker image on the AWI somehow went into some issue, I reloaded it again using the .tar file and it works fine. (Also, there is a space issue with the sql-related code in this part, but it's easy to fix)

Seems like the webarena file on the AWI is not the same version as this the one in this repo. I used this repo version.

The "context_length" field was declared in line 140 in openai_utils.py, but not between line 149-156. Also in line 41 - 48 of utils.py should add a line of "context_length=lm_config.gen_config["context_length"],".

Then I met an issue with the list of stop_tokens. I checked that the list always be empty, so I just comment out line 43 in lm_config.py, and according places.

For the suggested command to run : python run.py \ --instruction_path agent/prompts/jsons/new_action_prompt.json \ # this is the reasoning agent prompt we used in the paper --model gpt-3.5-turbo \ --mode completion \ --observation_type html \ --action_set_tag id_html_nasc_tree \ --result_dir \ --test_start_idx 0 \ --test_end_idx 1 \ notice that if anyone wants to use gpt-3.5-turbo, he/she should change the mode to chat instead of completion. If you want to use the completion choice, you need to change to model to gpt-3.5-instruct. (Details:)

I used gpt-3.5-instruct, but the output shows that the task failed. When I tried to use chat option, the program somehow crashed saying there is an unhandled error. Did not know what excatly happened, hope someone could tell me in the future, and apologies if I made mistakes in my statements.

THUDM / AutoWebGLM

Several problems noticed #12