How is the RLEF going specifically?

dongyh20 / Octopus

🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.

https://choiszt.github.io/Octopus/

249 stars 18 forks source link

How is the RLEF going specifically? #8

Closed OrilinZ closed 5 months ago

OrilinZ commented 6 months ago

I'm curious about the RLEF process, and it's a pioneer work in embodied agents. Is the LLM run by simulation and tuned by RL simultaneously?

dongyh20 commented 6 months ago

Thank you for expressing interest in our research. Actually, our method follows a 2-stage training scheme. Our methodology primarily employs GPT-4 for training data collection, and the simulator will simultaneously mark success/failure subtasks as environmental feedbacks. Note that collected training data is used for SFT at the first stage, and the feedbacks are used for RLEF at the second stage. So the data collection(simulation) and RLEF don't happen simultaneously.