Reinforcement Learning planning

Lundez commented 3 years ago

We need the following:

[ ] Reward function
[ ] State function (simple way to get game state)
[ ] "Action Space Function" (simple way to get possible actions)
[ ] End Condition

We need to add some kind of decay to make the agent speed up We need to think a lot on how to set the reward and feedback loop We need a algorithm that updates the weight based on this.

everlof commented 3 years ago

Can we proceed with them one at a time? Or are they very dependent? Maybe it works to work backwards, like, from your actions-space function, you'd be able to infer what should be contained in the state?

Reward function

I guess we want it to collect all of the seeds as fast as possible, I guess we want to minimise the number of "ticks"? Guess we're interest in like:

Positive reward: collecting a seed that is not yet unlocked/owned
Negative reward: the more time that has passed, the worse performance

Reward function

So, what info do we need from the game state, I guess:

status of the garden, i.e. what's planted on (x, y), maturity (where in it's lifespan is each plant)
status of all of the seeds, which are collected, which are not yet collected

Action Space Function

If there are mature plants - we should collect the seeds
If there are empty spots, we should probably plant some seeds
But don't leave all spots filled, then there's no spot for mutations to grow

How smart does this need to be? Do we want it to figure out itself which mutations that can occur, or should be try to be smart about that and plant plants close that we know produce mutations?

End condition

Is it this simple?

done if number of collected seeds == maximum numer of seeds (34)

Lundez commented 3 years ago

Starting as simple as possible is the best approach 😄

I think the state could be

Collected plants/seeds
The "gardening plots"
"Gardening plots left to build"
Maybe something else, not sure - you had a diagram on how to create plants. Perhaps something to add later to help the agent optimize Maybe something else? My guess is you know it best 😀

I think the following order would be best:

State
Action-space (inferred by state)
Reward function (before rewards we can use random)
End condition (might make sense to add like a "if we dont mutate/move forward in X rounds die" on top of gathering all the seeds, just to speed up)

everlof commented 3 years ago

I've started with something else - run it outside of a browser. Way easier to run it in the terminal for example instead of being required to do it in a browser + console. But after that's done I'll look into the state-function!

Lundez commented 3 years ago

Sounds like a excellent start. 😀

everlof commented 3 years ago

Got it working pretty fast, however I tried making the code easier to read, but hard to modularise since everything is set up to use like global Game and M variables. I put that in a different branch for now, don't know how we will need to restructure once we gonna train it, but we'll take that then.

Now you can proceed through ticks just by running npm run start and then continuously press enter. Here's an example how that might look:

You can see that 'weed' has started to spontaneously grow all around the garden, and you can also see lifetime of them, which seeds are owned, etc.

Lundez commented 3 years ago

Very cool, impressive! 😀

everlof commented 3 years ago

Hehe, yeah, now you can actually control it as well. I added command to interactively harvest, harvest all, and plant.

As you can see, here I've collected two more seeds in addition to the one you start with (0).

Basically:

enter -> next tick
p,0,1,2 -> plant 0 at position (1,2)
h,1,2 -> harvest plant at position (1,2)
a -> harvest all plants

Now I just do easiest thing possible, to get on with stuff, not like I check the validity of the entered commands, etc.

everlof commented 3 years ago

I've created a version that can perform actions automatically, it seemed to get stuck at 17 seeds, so I stopped at 100,000 runs.

everlof commented 3 years ago

Cool, made some more progress, 24 of 34 until it stuck.

everlof commented 3 years ago

Resolving some bugs and making the action's a bit smarter, getting close:

everlof commented 3 years ago

Alright, now I've got the first version that actually completed all seeds!

320057 ticks. The shortest time for a tick in the real game is 3 minutes, so if this same game was a real game, it would be 667 days long (320057*3/60/24).

The simulation itself took 6 minutes on my macbook.

I have added special cases for TWO seeds which require very specific setups to be able to mutate. I don't think they'd ever complete otherwise, but I guess that's reasonable.

The rules for this was:

If all the plants in the garden is populated by weed or mold, harvest everything.
If a plant is mature and we don't have it's seed yet, harvest it so we collect the seed.
If a plant is mature and we don't have the plants it could potentially generate by harvesting it, harvest it (this is a special case for meddleweed which can give mold upon harvest).
If a plant is mature and "immortal" harvest it (elderwort has large chance of postponing the harvest since if it always is harvest upon maturity, there's no room for it's mutations to take form).
If the garden is occupied < 25%, plant something randomly from the seeds we've got so far (here's two special cases for everdaisy and juicy queenbeet where there's a % chance to plant specifically for these, if the preconditions are fulfilled, i.e. we have the seeds for it.)

If ANY condition from the above yields an action - perform the action, otherwise tick the game to next step.

Lundez commented 3 years ago

Yeah that's a lot of time to spend yourself if it wasn't sped up 😅😅

Really cool! This is a superb baseline, as it's possible to complete in this timeframe we can take that as a end-condition (making the reinforcement learning not go on forever). 😀

Congratulations to awesome progress! Incredible to see step by step moving forward this far 👌

Milestone 1.0!

everlof commented 3 years ago

Thank you :) Hehe, yeah, felt a bit spammy there for a seconds, but was really fun to beat the previous 👍

It was actually way harder than I imagined to get it to complete set of seeds, mostly due to weed/molds, which grows/mutates like crazy, but there are also mutations that comes from these, so you want mutations of course, but you don't want it to disturb/interrupt progress of other seeds by spreading.

Yeah, agreed with the end-condition. Think I'm gonna run it some more times to see how that plays out.

everlof commented 3 years ago

Okay, now I will take it easy. I added some logic so that it only plants seeds that are somehow a part of a mutation that we want. This changed the ticks number down with more than a magnitude! Mostly around 15k but 8,080 was the lowest I saw.

Lundez commented 3 years ago

Awesome!

Heuristics can make a strong baseline that might prove hard to beat as this has so many random parameters included.

Will be very fun to take a deeper look once I get time!😀😀

Lundez commented 3 years ago

Hi @everlof took a quick look today. The file still feel huge and I don't think I'm gonna find time to actually parse it down in-between competence nights, work and other. 😞

But I'd like to congratulate on awesome progress! 8080 ticks, the same as my local webpages port 😆 😄 🥳

I think perhaps the easiest way is to not use reinforcement learning but rather A*-search, where you build a graph and are able to prune and not follow along some paths. This is easier to plug into a large code-base, rather than most reinforcement approaches which requires certain input/output format to make it easy (otherwise a lot of work hehe).

Sorry I haven't been able to help more. But it was awesome to follow along with your progress! 😄

everlof / garden