Open Lundez opened 3 years ago
Can we proceed with them one at a time? Or are they very dependent? Maybe it works to work backwards, like, from your actions-space function, you'd be able to infer what should be contained in the state?
I guess we want it to collect all of the seeds as fast as possible, I guess we want to minimise the number of "ticks"? Guess we're interest in like:
So, what info do we need from the game state, I guess:
How smart does this need to be? Do we want it to figure out itself which mutations that can occur, or should be try to be smart about that and plant plants close that we know produce mutations?
Is it this simple?
done if number of collected seeds == maximum numer of seeds (34)
Starting as simple as possible is the best approach 😄
I think the state could be
I think the following order would be best:
I've started with something else - run it outside of a browser. Way easier to run it in the terminal for example instead of being required to do it in a browser + console. But after that's done I'll look into the state-function!
Sounds like a excellent start. 😀
Got it working pretty fast, however I tried making the code easier to read, but hard to modularise since everything is set up to use like global Game
and M
variables. I put that in a different branch for now, don't know how we will need to restructure once we gonna train it, but we'll take that then.
Now you can proceed through ticks just by running npm run start
and then continuously press enter. Here's an example how that might look:
You can see that 'weed' has started to spontaneously grow all around the garden, and you can also see lifetime of them, which seeds are owned, etc.
Very cool, impressive! 😀
Hehe, yeah, now you can actually control it as well. I added command to interactively harvest, harvest all, and plant.
As you can see, here I've collected two more seeds in addition to the one you start with (0).
Basically:
0
at position (1,2)
(1,2)
Now I just do easiest thing possible, to get on with stuff, not like I check the validity of the entered commands, etc.
I've created a version that can perform actions automatically, it seemed to get stuck at 17 seeds, so I stopped at 100,000 runs.
Cool, made some more progress, 24 of 34 until it stuck.
Resolving some bugs and making the action's a bit smarter, getting close:
Alright, now I've got the first version that actually completed all seeds!
320057 ticks. The shortest time for a tick in the real game is 3 minutes, so if this same game was a real game, it would be 667 days long (320057*3/60/24).
The simulation itself took 6 minutes on my macbook.
I have added special cases for TWO seeds which require very specific setups to be able to mutate. I don't think they'd ever complete otherwise, but I guess that's reasonable.
The rules for this was:
meddleweed
which can give mold upon harvest).elderwort
has large chance of postponing the harvest since if it always is harvest upon maturity, there's no room for it's mutations to take form).If ANY condition from the above yields an action - perform the action, otherwise tick
the game to next step.
Yeah that's a lot of time to spend yourself if it wasn't sped up 😅😅
Really cool! This is a superb baseline, as it's possible to complete in this timeframe we can take that as a end-condition (making the reinforcement learning not go on forever). 😀
Congratulations to awesome progress! Incredible to see step by step moving forward this far 👌
Milestone 1.0!
Thank you :) Hehe, yeah, felt a bit spammy there for a seconds, but was really fun to beat the previous 👍
It was actually way harder than I imagined to get it to complete set of seeds, mostly due to weed/molds, which grows/mutates like crazy, but there are also mutations that comes from these, so you want mutations of course, but you don't want it to disturb/interrupt progress of other seeds by spreading.
Yeah, agreed with the end-condition. Think I'm gonna run it some more times to see how that plays out.
Okay, now I will take it easy. I added some logic so that it only plants seeds that are somehow a part of a mutation that we want. This changed the ticks number down with more than a magnitude! Mostly around 15k but 8,080 was the lowest I saw.
Awesome!
Heuristics can make a strong baseline that might prove hard to beat as this has so many random parameters included.
Will be very fun to take a deeper look once I get time!😀😀
Hi @everlof took a quick look today. The file still feel huge and I don't think I'm gonna find time to actually parse it down in-between competence nights, work and other. 😞
But I'd like to congratulate on awesome progress! 8080 ticks, the same as my local webpages port 😆 😄 🥳
I think perhaps the easiest way is to not use reinforcement learning but rather A*-search, where you build a graph and are able to prune and not follow along some paths. This is easier to plug into a large code-base, rather than most reinforcement approaches which requires certain input/output format to make it easy (otherwise a lot of work hehe).
Sorry I haven't been able to help more. But it was awesome to follow along with your progress! 😄
We need the following:
We need to add some kind of decay to make the agent speed up We need to think a lot on how to set the reward and feedback loop We need a algorithm that updates the weight based on this.