PanasonicConnect / rap

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
MIT License
7 stars 1 forks source link

Clarification on the Retrieval Process in WebShop Experiment #6

Open wenyaxie023 opened 1 month ago

wenyaxie023 commented 1 month ago

Dear authors,

I have a question regarding the WebShop experiment described in your paper. The paper states, "The Retriever extracts three experiences and five actions from before and after the most similar action." However, I'm having difficulty understanding how these "3 experiences" and "5 actions" are reflected in the code.

From what I see in the open-source code, the init_prompt seems to be composed only of the relevant actions, without any connection to experiences. Also, considering the analogy_len=10 setting in basic_config.yaml, the number of actions included in the init_prompt often doesn't match the five mentioned in the paper.

Could you please clarify this? Any insights would be greatly appreciated.

Thank you for your assistance.

PrincetonThong commented 4 weeks ago

Hi @wenyaxie023, Thanks for showing interest in our work!

  1. Experiences The top k experiences are first retrieved based on L387 and L407 before adding onto init_prompt. Here, the config is set as k=3, so the top 3 experiences are retrieved from memory.

  2. Actions For each memory instance, the saved JSON "Actions" (which is the trajectory) include both the actions performed by the agent, and the observation returned by the webpage. For 5 actions, we need analogy_len = 5*2 = 10 to account for the interleaving of agent-actions and webpage-observations.