MineDojo / Voyager

An Open-Ended Embodied Agent with Large Language Models
https://voyager.minedojo.org/
MIT License
5.4k stars 493 forks source link

Will a learned checkpoint be added to the repo? #42

Closed liusida closed 1 year ago

liusida commented 1 year ago

This is a very interesting experiment! Really like the idea of accumulating skills as executable codes.

I can't find any checkpoint in this repo, and want to take a quick look at the learned skill codes. Will any checkpoint be added to this repo in the near future?

Thanks.

wangguanzhi commented 1 year ago

Thanks for your interest! Yeah, we are working on it!

daswer123 commented 1 year ago

Before the official comes out, you can use my checkpoint. #27

go-maple commented 1 year ago

@daswer123 bro,How to load the checkpoint? Can I use gpt-3.5 to load the check point?

" voyager = Voyager( azure_login=azure_login, skill_manager_retrieval_top_k=0, # because I use gpt 3.5 max_iterations=200, resume=True, openai_api_key=openai_api_key, )

"

daswer123 commented 1 year ago

Just put it in the root folder and use voyager.inference("YOUR TASK") instead of voyager.learn() For example voyager.inference("Chop wood by hand, craft a workbench and create a wooden axe") gpt-3.5 is much worse, but I think checkpoint will help you solve tasks better than if you started from scratch.

go-maple commented 1 year ago

@daswer123 Tks, bro. I will try it. I found the task names from the path in checkpoint(./ckpt/curriculum/completed_tasks.j son).

image

for example: voyager.inference("Mine 1 wood log")

daswer123 commented 1 year ago

These tasks are embedded in the training, they serve as an example for others to perform. any task can be written in voyager.inference()

TimeLordRaps commented 1 year ago

I am currently working on a more reusable, refactored, and composable skill library through custom curriculum. I will share once I feel it's ready.

go-maple commented 1 year ago

@TimeLordRaps bro,Can you share your custom curriculum?

TimeLordRaps commented 1 year ago

I'm still refining until I have an automatic curriculum that just replaces certain parts of it with relevant items to the task. This can be expanded to take use of the curriculum call that would be happening for generating relevant items to the task.

Generally, for the task you want something direct such as Acquire item Though I found the most success using 1 as in Acquire 1 item for stackables and non-placeables and Acquire a item for stackables and non-placeables. Much more experimentation is needed.

The context is where magic happens and leaves the most room for prompt engineering. I start my context with In your code... to be direct. I follow it with priming for the code agent to make the code reusable, general, and I haven't run tests on this one yet but I assume composable would add to performance. I also tell it to use a crafting table, if necessary. I explicitly specify to check for items relevant to the task, I finish with if any of those items are not present acquire them.

My context is actually fairly short and on top of the composability suggestion I can imagine other additions that would be possible through adapting the curriculum call for structured curriculums.

Another interesting direction I have thought of is for tasks to be like Refactor your X function, I haven't tested this yet, but I think it would be useful, however this may be inconvenient to implement automatically, unless you are aware of tasks it codes poorly, but still succeeds. In writing this I thought of Acquire item, then refactor code for generality, reusability, composability, and edge cases. This may be pushing the limits at least for me of the naming constraints of the events checklist.

I also set the max tries for tasks from 4 to 8 because if it finishes early it moves on, and if it needs extra time for refactoring or whatever this allows for it.

I have found that repeating failed tasks for example "Acquire a bucket" if successful on subsequent tries (only seen immediate next tries) removes the failed task.

For the pacing of the curriculum, I am still working it out. Generally, I go for more composability and thus go slower than necessary through the basic steps such as log, plank, stick, ... While this does increase iterations, this structured curriculum approach seems to work quite well when no skill library is present, and probably produces a higher quality skill library and greater hierarchical compositionality of the skills.

With all of this being said, take everything with a grain of salt. I want to clarify that my prompts/curriculum.txt has the ultimate goal of beating the ender dragon, and necessary steps to do so. Other changes to prompting include changing criteria 7 of the prompts/curriculum.txt to Do not ask me to build or dig shelter even if it's at night. Guide me to explore areas that are directly relevant to the goal of defeating the Ender Dragon. The sequence of relevant locations should include areas where I can gather necessary resources, places where I can find the most Endermen, the Nether to find a Nether Fortress, and a Blaze Spawner inside a Nether Fortress for acquiring blaze powder, the overworld to find the Stronghold, the stronghold to find the End Portal, and the End Island to fight the Ender Dragon. I should stay in these areas only as long as it is necessary to gather resources or learn skills for this goal., A "useless" addition of a 9th criteria for dumping useless items instead of hoarding in chests. I say useless because I only ran into full inventory on my over 100 iterations first experiment, and have not gone deep enough into any other experiments since to see the need of disposing of useless items. Also I modify criteria 7 in prompts/action_template.txt to UseexploreUntil(bot, direction, maxDistance, callback)when you cannot find something. You should frequently call this before mining blocks or killing mobs. You should select a direction at random every time instead of constantly using (1, 0, 1), you should explore between 300 seconds and 420 seconds depending on the task. I found for certain tasks no matter what it would always just search for a minute. So I limit it to 600s timeout I have set it lower in case of sequential exploreUntil. O yeah curriculum_qa_step1_ask_question also has ultimate goal changed as well.

We should all be on even playing field now.

TimeLordRaps commented 1 year ago

One thing I forgot to mention was a success condition in the context.

When using refactoring in the task, your success condition should require the item and refactored code. This seems to improve code composability greatly.

I have not tested this yet, but I suspect giving more specific refactoring requirements in the success condition would yield less iterations, as I have seen in the above, the critic saying it is not sure of the requirements thus leading to an extra refactoring step. However, in the few examples I have tried with the above the refactoring step produces much more composable code, so maybe the extra refactoring step is required.

Edit: Upon more testing it seems without separating the in-game tasks from code refactoring tasks the critic gets more often confused, than when refactoring is only present in the context. Ex. Acquire 1 log, then refactor your code for reusability, generality, and composability. vs Acquire 1 log then followed by Refactor your code for acquiring 1 log.

Edit 2: Upon testing separated acquire and refactor tasks, the critic model fails more often than not, the context for the refactoring task needs to specify a Minecraft success condition. Ex Success is at least 1 more log in your inventory for the context for the task Refactor your code for acquiring 1 log... The critic failing is caused by it being unaware of the nature of the bot and the context not making sense to it. Thus some changes to prompts/critic.txt could be made that would either present the code, needing a restructuring of the langchain files, or could just introduce an awareness of some sort by mentioning mineflayer in the prompts/critic.txt. Going to keep trying with the default critic.txt for a little bit.

go-maple commented 1 year ago

@daswer123 @TimeLordRaps Thanks for sharing your experience.It's very important for me. I have chatgpt plus account,but I don't have the ability about gpt4 api . and I have to find other way to test this code. only gpt 3.5.

xieleo5 commented 1 year ago

Hi, all! @liusida @daswer123 @go-maple @TimeLordRaps We just released our skill libraries, you can check README for how to run tasks with the skill libraries.

go-maple commented 1 year ago

@xieleo5 tks. bro. It's amazing.

TimeLordRaps commented 1 year ago

@xieleo5 would it be possible to write a step-by-step guide on the readme for the skill libraries instructing how to go about modifying an existing skill library manually, for example creating new skills with human defined code, removing specific skills, and updating specific skills? Thank you for providing the skills used from your experiments.

go-maple commented 1 year ago

@TimeLordRaps Yes,I totally agree with you.I don't have gpt-4 ability.so I want to create new skills with human defined code. It can also help to code refactoring.

liusida commented 1 year ago

Hi, all! @liusida @daswer123 @go-maple @TimeLordRaps We just released our skill libraries, you can check README for how to run tasks with the skill libraries.

Thanks! This is amazing! It's quite interesting to read the skills accumulated by the AI agent. Keep being awesome!