everlof / garden

1 stars 0 forks source link

Reinforcement Learning planning #1

Open Lundez opened 3 years ago

Lundez commented 3 years ago

We need the following:

We need to add some kind of decay to make the agent speed up We need to think a lot on how to set the reward and feedback loop We need a algorithm that updates the weight based on this.

everlof commented 3 years ago

Can we proceed with them one at a time? Or are they very dependent? Maybe it works to work backwards, like, from your actions-space function, you'd be able to infer what should be contained in the state?

Reward function

I guess we want it to collect all of the seeds as fast as possible, I guess we want to minimise the number of "ticks"? Guess we're interest in like:

Reward function

So, what info do we need from the game state, I guess:

Action Space Function

How smart does this need to be? Do we want it to figure out itself which mutations that can occur, or should be try to be smart about that and plant plants close that we know produce mutations?

End condition

Is it this simple?

done if number of collected seeds == maximum numer of seeds (34)

Lundez commented 3 years ago

Starting as simple as possible is the best approach 😄

I think the state could be

I think the following order would be best:

  1. State
  2. Action-space (inferred by state)
  3. Reward function (before rewards we can use random)
  4. End condition (might make sense to add like a "if we dont mutate/move forward in X rounds die" on top of gathering all the seeds, just to speed up)
everlof commented 3 years ago

I've started with something else - run it outside of a browser. Way easier to run it in the terminal for example instead of being required to do it in a browser + console. But after that's done I'll look into the state-function!

Lundez commented 3 years ago

Sounds like a excellent start. 😀

everlof commented 3 years ago

Got it working pretty fast, however I tried making the code easier to read, but hard to modularise since everything is set up to use like global Game and M variables. I put that in a different branch for now, don't know how we will need to restructure once we gonna train it, but we'll take that then.

Now you can proceed through ticks just by running npm run start and then continuously press enter. Here's an example how that might look:

Screenshot 2020-11-15 at 08 20 51

You can see that 'weed' has started to spontaneously grow all around the garden, and you can also see lifetime of them, which seeds are owned, etc.

Lundez commented 3 years ago

Very cool, impressive! 😀

everlof commented 3 years ago

Hehe, yeah, now you can actually control it as well. I added command to interactively harvest, harvest all, and plant.

As you can see, here I've collected two more seeds in addition to the one you start with (0).

Screenshot 2020-11-15 at 13 22 27

Basically:

Now I just do easiest thing possible, to get on with stuff, not like I check the validity of the entered commands, etc.

everlof commented 3 years ago

I've created a version that can perform actions automatically, it seemed to get stuck at 17 seeds, so I stopped at 100,000 runs.

Screenshot 2020-11-15 at 19 36 36
everlof commented 3 years ago

Cool, made some more progress, 24 of 34 until it stuck.

Screenshot 2020-11-15 at 20 39 08
everlof commented 3 years ago

Resolving some bugs and making the action's a bit smarter, getting close:

Screenshot 2020-11-15 at 22 37 09
everlof commented 3 years ago

Alright, now I've got the first version that actually completed all seeds!

320057 ticks. The shortest time for a tick in the real game is 3 minutes, so if this same game was a real game, it would be 667 days long (320057*3/60/24).

The simulation itself took 6 minutes on my macbook.

I have added special cases for TWO seeds which require very specific setups to be able to mutate. I don't think they'd ever complete otherwise, but I guess that's reasonable.

Screenshot 2020-11-15 at 22 47 37

The rules for this was:

If ANY condition from the above yields an action - perform the action, otherwise tick the game to next step.

Lundez commented 3 years ago

Yeah that's a lot of time to spend yourself if it wasn't sped up 😅😅

Really cool! This is a superb baseline, as it's possible to complete in this timeframe we can take that as a end-condition (making the reinforcement learning not go on forever). 😀

Congratulations to awesome progress! Incredible to see step by step moving forward this far 👌

Milestone 1.0!

everlof commented 3 years ago

Thank you :) Hehe, yeah, felt a bit spammy there for a seconds, but was really fun to beat the previous 👍

It was actually way harder than I imagined to get it to complete set of seeds, mostly due to weed/molds, which grows/mutates like crazy, but there are also mutations that comes from these, so you want mutations of course, but you don't want it to disturb/interrupt progress of other seeds by spreading.

Yeah, agreed with the end-condition. Think I'm gonna run it some more times to see how that plays out.

everlof commented 3 years ago

Okay, now I will take it easy. I added some logic so that it only plants seeds that are somehow a part of a mutation that we want. This changed the ticks number down with more than a magnitude! Mostly around 15k but 8,080 was the lowest I saw.

Lundez commented 3 years ago

Awesome!

Heuristics can make a strong baseline that might prove hard to beat as this has so many random parameters included.

Will be very fun to take a deeper look once I get time!😀😀

Lundez commented 3 years ago

Hi @everlof took a quick look today. The file still feel huge and I don't think I'm gonna find time to actually parse it down in-between competence nights, work and other. 😞

But I'd like to congratulate on awesome progress! 8080 ticks, the same as my local webpages port 😆 😄 🥳

I think perhaps the easiest way is to not use reinforcement learning but rather A*-search, where you build a graph and are able to prune and not follow along some paths. This is easier to plug into a large code-base, rather than most reinforcement approaches which requires certain input/output format to make it easy (otherwise a lot of work hehe).

Sorry I haven't been able to help more. But it was awesome to follow along with your progress! 😄