andreped / super-ml-pets

🐢 AI for Super Auto Pets
MIT License
31 stars 13 forks source link

Swap problem #41

Closed aaaaaaaaaaaaaaaaadd22 closed 2 years ago

aaaaaaaaaaaaaaaaadd22 commented 2 years ago

The bot mainly swap position of pets. It barely do anything else. You should add a limit to the amount of times the bot allowed to swap pet position so it actually does something else than swapping

aaaaaaaaaaaaaaaaadd22 commented 2 years ago

And it never buy any of the new pets. Because it also never sell

aaaaaaaaaaaaaaaaadd22 commented 2 years ago

By new i mean tier 2+ pets.

andreped commented 2 years ago

It definitely buys stuff. It also sells. Only thing it cannot do yet is freeze items.

Have you trained a bot from scratch yourself, or have you tried using the pretrained model from here: https://github.com/andreped/super-ml-pets/releases/tag/v0.0.6

It is far from perfect, but at least there it seems to have learned some relevant stuff. However, we are current trying to improve the reward system to make it more efficient.

aaaaaaaaaaaaaaaaadd22 commented 2 years ago

Hmm do you have a video cause i think my system broke the ai or something i wanna see how v6 plays. If not its alright

andreped commented 2 years ago

Hmm do you have a video cause i think my system broke the ai or something i wanna see how v6 plays. If not its alright

Don't have a video right now, but I can make a gif later.

Note that the current deployment solution is not that optimized, runtime wise. The reason why it might seems that shuffling is very often done, is because a reorder action does not just mean a single movement, but rather a full change of the current animal order. If the animal order is (0, 1, 2, 3, 4), a reorder action generates a full new order (2, 4, 0, 3, 1). In order to generate the new order, multiple animal swaps will have to be made. This is probably what you are observing.

However, I agree with you that it is rather strange that in a single game it needs to do more than a single reorder, as humans would find it more natural to do it at the end of the shop, right before a battle. However, this is what the AI has found suitable. We could add a feature to better guide it in the reward system, but it is not prioritized atm. Right now we are trying to get the full pipeline stable.

andreped commented 2 years ago

One of the main reasons why the AI is not performing better is because the simulated environment used for training, does not mimic real games well enough. Hence, even if we see a high reward per game, the AI is just playing another computer, and currently a very naive computer. Initially we tested Deep Q-Learning, but for fixing bugs we tried a simpler solution. Hopefully we will be able to add a better reward system.

Note that you are free to contribute to the framework. I have limited time doing this, as I do it in my sparetime, but when I get time I will try to be as productive as possible :) However, I will likely spend more time on developing the framework than training models, but for every release I hope to make an improved model.

aaaaaaaaaaaaaaaaadd22 commented 2 years ago

If you made the gif pls send it to me tks