HumanCompatibleAI / overcooked_ai

A benchmark environment for fully cooperative human-AI performance.
https://arxiv.org/abs/1910.05789
MIT License
683 stars 144 forks source link

How to train my own BC agent? #129

Closed Lee-daeho closed 3 months ago

Lee-daeho commented 10 months ago

First of all, thank you for your commitment to the repository.

I've been tried to make a new BC agent based on my dataset.

I collected some data from your demo(https://humancompatibleai.github.io/overcooked-demo/), but the structure of data is far different with the on that you used to train BC agent in behavior_cloning_tf2.py.

How can I collect my data and train my own BC agents?

Please help me..

micahcarroll commented 10 months ago

I'm sorry for the confusion, I believe this is the repo you should use for collecting human data in the correct format.

Alternatively, it might be in the correct format (I forget), but you should use the pre-processing script described here.

Let me know if that helps!

Lee-daeho commented 10 months ago

Thanks in advance.

Can I ask you some more questions after I try, by any chance?

Also, I have one more question about BC agent training.

I used your behavior_cloning_tf2.py to create BC agent based on your dataset.

However, I have no idea how to make it into usable version in demo(using up.sh).

Could you let me know how can I handle it?

Thank you so much for your kindness.

jyan1999 commented 10 months ago

Hi Lee,

I worked on this part of the project a while ago so let me try to chime in here. It has been a while so I am a little hazy on the details, but I'll try to give you the big picture.

1): To collect trajectories to train your own BC agent, you can follow the setup in this readme. Once the server is up and running, go to your localhost and check the collect data box to enable data collection. The collected trajectories will be saved at this location as outlined in the docker-compose file. 2): If I remember correctly the collected data is a bit different from what we used in training. The collected data will be a hashmap, with a key-value pair trajectory: [data_for_state_1, data_for_state_2 ...], and the actual data is a dataframe. The data_for_state_# is another hashmap of key value pairs following the raw schema. You should be able to convert this into a dataframe and follow the direction in this readme to do additional processing to get it into the state of the existing pickled dataframes. The existing ones can be found here. 3): Once you trained your BC agent with your own trajectories, I don't think you can directly load that into the demo by default. I think the assumption was we would only load PPO agents after training via self-play or with another BC agent. I need to look through the codebase again to see if there is a workaround, but on top of my head something that might work is you can setup a training session with a PPO agent + your BC agent, do 1 iteration of training, save the trainer and follow the direction in this readme to move everything into overcooked_demo directory, ignore the PPO agent, and load the BC agent by changing this line to load_agent(fpath, policy_id='bc', agent_index=idx). Again no guarantee this would work, but you can give it a shot if you are stuck.

Hopefully this is helpful, and let me know if anything is confusing.

(wow I am actually a little impressed that I can recall all of this

micahcarroll commented 10 months ago

Thanks so much @jyan1999 for providing all this additional context for this! This is great!

Lee-daeho commented 10 months ago

I appreciate your kindness and help @micahcarroll @jyan1999.

Also, this is amazing you guys still follow-up and remember these things I'm so impressed.

I will try and let you know if it works!

Again, thank you so much!!