demilich1 / metastone

Hearthstone simulator written in Java with full GUI support
GNU General Public License v2.0
132 stars 80 forks source link

AI Brainstorm #102

Closed mbadov closed 7 years ago

mbadov commented 8 years ago

Apologies if this isn't the correct place for this. I have some ideas regarding implementing an AI, and wanted to chat with you briefly. You may have already tried this, but the principal idea would be to train a model based on the board state, with the output being a win or a loss (instead of trying to use heuristics). Given that this data would have moderate dimensionality (~2-4x the number of minions), a lot of training data would be necessary. I think that there are probably enough YouTube videos of Hearthstone games out there - the challenge would be extracting structured data from them. I'd love to get in touch to bounce some ideas off of you.

Very cool project, by the way.

demilich1 commented 8 years ago

In general I think training an AI which learns to play the game is really the way to go. Regarding your specific approach I would have some questions/concerns:

When your output is just win or loss, how do you decide between different good moves? So for example, you are a Rogue and your board consists of Ragnaros and a SI:7 Agent while the opponent just has a Knife Juggler on board. It is your turn and you have a Backstab in your hand. You now have three options: a) ignore the Knife Juggler b) kill it with your SI:7 or c) kill it with Backstab. In my opinion, the model you trained will output a win for every move you make, because having a Ragnaros on your board while the enemy basically has nothing is a comfortable position. However our experience with Hearthstone tells us that killing the enemy Knife Juggler with the Backstab is probably still a good idea. So basically in short: How do you plan to decide between different moves which are classified as 'win' (or even between different 'loss' moves if there are no winning moves)?

I don't think that getting game data from YouTube is feasible. While it is certainly not impossible, it sounds like a lot of work. You would have to develop a lot of non-trivial image recognition stuff (i.e. you don't just have to be able to detect that there is (for instance) a Wild Pyromancer on board, but also if it is silenced or Corruptioned, also you have to detect all the streamer overlays, etc). Apart from that, it will take a lot of time acquiring enough data. Even if the analysis works flawlessly, the you would have to compile a link list with relevant YouTube videos by hand. And I don't think you would get any good results with 100 or 1000 games, in my opinion you need at least a magnitude more than that.

I would actually take another approach and train the AI by playing another AI. For example you could start by letting your AI play against the PlayRandomBehaviour. This is of course very weak, but from experience I can tell you that it is not trivial to write an AI which has >90% win rate against even that. You could then continue playing against GameStateValue or a static version of your own trained AI. This is the approach that TD Gammon used, a world class backgammon AI. It trained by just playing against itself using Temporal Difference learning (after it reached a certain level of play it got extended by hand-crafted backgammon knowledge which made it even better). In general this is the standard approach in Reinforcement Learning, just let the AI play games and evaluate based on win or loss if the actions made during a match were right or wrong.

Apart from that, I would be interested which model do you plan to train? I once tried neural networks, but the results very quite bad. That does not mean that the neural network approach does not work - quite the contrary, especially after the success of the Google AI in Go I once again think that deep neural network are capable of amazing performance. However, there are so many parameters to tune (which inputs do you give (in which representation, which is very important), which algorithm do you use for learning, how many neurons, how many hidden layers, etc). And after changing any of those parameters, you have to train the AI from zero with a huge number of simulations, which usually takes many hours.

So overall I would be quite interested in the details of your approach and if you decide to actually implement it in some way, I would be VERY interested in any results (even if the results are bad)!

mbadov commented 8 years ago

I will elaborate a bit more on my approach, and perhaps answer some of your questions in the process. The idea would be to first model the board state. Then, after a game is played, go back and label every board state with either a 1.0 if the board led to a win, or a -1.0 if the board led to a loss. Perhaps some weighting could be used as well. Once all the data is gathered, train a regressor to output a continuous value from -1.0 to 1.0 to rank any given board state.

Any model would have a few things in common, like a variable for hand size, life total, cards in deck, turn number, etc. Modeling the actual state of the board is where it gets more interesting. I can think of a few ways off of the top of my head.

  1. Have a few variables per board slot and hand slot. For example, x1_a would be the id of the minion, x1_b would be the attack of the minion, x1_c would be the health, x1_d is whether it is frozen, etc. This approach would keep the dimensionality fairly low, but it would mean that only regressors capable of taking discrete values could be used, since the id of the minion isn't really a range. This model also has a few other issues as well.
  2. Have multiple variables for every minion. For example, x1_a would be the health of Bloodfen Raptor, x1_b its attack, etc. I think this would lead to a better model, but it would be a bit tricky to do correctly, because it may lead to very high dimensionality. For a perfect model, you would need 14x the minion variable tuples, because there could be up to 14 of the same minion on the board.

We don't have to start off with the perfect model from the get go, and it would probably be OK if the model missed a few things in the beginning. Now, to answer your original question. Given enough games, the regressor should theoretically assign a greater value to the the board state in which your opponent does not have knife juggler and you do not have a Backstab in your hand, than a state where both exist. This would be true even if all reachable board states are overwhelmingly positive.

I think using training data from the AI playing itself would be a good start.

webadict commented 8 years ago

This might work for a board state AI, but this would be a large database of boards. Plus whether a battlecry creates a board state would be tough as well, since random chances would be reset each check. In total, this would likely fail in comparison to Game State, but might beat Random. On Mar 23, 2016 11:20 AM, "mbadov" notifications@github.com wrote:

I will elaborate a bit more on my approach, and perhaps answer some of your questions in the process. The idea would be to first model the board state. Then, after a game is played, go back and label every board state with either a 1.0 if the board led to a win, or a -1.0 if the board led to a loss. Perhaps some weighting could be used as well. Once all the data is gathered, train a regressor to output a continuous value from -1.0 to 1.0 to rank any given board state.

Any model would have a few things in common, like a variable for hand size, life total, cards in deck, turn number, etc. Modeling the actual state of the board is where it gets more interesting. I can think of a few ways off of the top of my head.

1.

Have a few variables per board slot and hand slot. For example, x1_a would be the id of the minion, x1_b would be the attack of the minion, x1_c would be the health, x1_d is whether it is frozen, etc. This approach would keep the dimensionality fairly low, but it would mean that only regressors capable of taking discrete values could be used, since the id of the minion isn't really a range. This model also has a few other issues as well. 2.

Have multiple variables for every minion. For example, x1_a would be the health of Bloodfen Raptor, x1_b its attack, etc. I think this would lead to a better model, but it would be a bit tricky to do correctly, because it may lead to very high dimensionality. For a perfect model, you would need 14x the minion variable tuples, because there could be up to 14 of the same minion on the board.

We don't have to start off with the perfect model from the get go, and it would probably be OK if the model missed a few things in the beginning. Now, to answer your original question. Given enough games, the regressor should theoretically assign a greater value to the the board state in which your opponent does not have knife juggler and you do not have a Backstab in your hand, than a state where both exist. This would be true even if all reachable board states are overwhelmingly positive.

I think using training data from the AI playing itself would be a good start.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/demilich1/metastone/issues/102#issuecomment-200420674

demilich1 commented 8 years ago

Okay, I see, so classifying against win/loss is just the first step then. Sounds good, as I said, I would be really interested in the results.

mbadov commented 8 years ago

Just an update. I rigged the code to save a feature vector of continuous values representing the board state at the end of every turn. At the end of the game, each vector is updated with the target value (win, 1.0, or loss, - 1.0). For now, I'm using a very simple deck that has no RNG (other than the card draw at the beginning), and training only on the output of simulations using this deck. Again, I'm trying to keep things simple to assess the viability of this solution.

I used scikit learn (the ML library I am most familiar with) to experiment with various regressors. The sample size is around 1000 games (and 10-20 turns per player per game). It seems that random forest regression can fairly accurately predict the outcome of a game, even early on. In this case, an avg error of 1 would be the baseline (random guessing).

>>> avg_error_at_turn(0)
0.39684197802197768
>>> avg_error_at_turn(5)
0.27945557267591115
>>> avg_error_at_turn(10)
0.19632523759239731
>>> avg_error_at_turn(20)
0.11644444444444449

SVR, without any kind of hyperparam optimization, did alright, but not nearly as well.

>>> avg_error_at_turn(0)
0.86378780690485846
>>> avg_error_at_turn(5)
0.50920812964669881
>>> avg_error_at_turn(10)
0.38173392812749213
>>> avg_error_at_turn(20)
0.27836305183542004

At the end of the first turn, SVR isn't too far from random guessing, while RF is actually a decent predictor. I suspect that tuning the hyperparams and training on more data can yield even better results. The next step is to use a java ML library so that I can integrate it into the code, and write a Behavior using the model to decide what moves to make. I suspect that this will actually do decently against the existing AIs.

The biggest question is if this approach will scale to the entirety of the card base, and how much data will be necessary given the large number of dimensions. There are also some interesting sub-challenges, like how to model RNG.

The cool thing about this approach is that given a good model, it theoretically takes absolutely everything into account. For example, because the graveyard state of spells and minions is represented in the model, the AI would realize what type of deck it is playing against as the game progresses, and adjust accordingly.