Open Akababa opened 6 years ago
Hi @Akababa, this looks great, I'm going over it now.
In the mean time, I noticed your flip_policy
step. Could you say more about this?
@Zeta36 I'm wondering if this is the crucial bug that appeared since the DeepMind-style board representation. When you flip the board to orient the features to the perspective of the current player... Then the final NN map onto the policy vector must be flipped back as well!? Furthermore it would be necessary to "preemptively" flip visit count information before feeding it into the neural network, e.g. during the convert_to_training_data
step.
Also, why did you remove the manual annealing of the learning rate?
Hey @benediamond , thanks for commenting! Btw I'm pushing more optimizations to https://github.com/Akababa/chess-alpha-zero/blob/opts/src/chess_zero/agent/player_chess.py now. It looks like it's working well.
I didn't train for 100,000 steps anyway (the first time lr changes) so it doesn't really matter, I was just experimenting with different optimizers. On the scale of our testing Google's annealing isn't really applicable.
yes, I think I did the flipping stuff correctly here, but would really appreciate if you could take a quick look to see if it checks out from your point of view.
If you have code you'd like a sanity check on too I'd be happy to help out :)
@benediamond:
When you flip the board to orient the features to the perspective of the current player... Then the final NN map onto the policy vector must be flipped back as well!?
Well, this is really a problem we didn't take into account. Certainty it may be the cause of the converge failing.
Yes. I can't believe we didn't think of this. @Akababa, kudos! I'll be looking through your code making sense of everything. I'll let you know how things work.
Yeah that's always a worry in the back of my mind (hence the paranoid asserts). I'm a little confused by the conversation though, is there already a bug found in my code? Or is it a previous one before my implementation?
@benediamond Thank you! Please feel free to write some test cases and sanity checks.
@Akababa The point is that I had developed a "DeepMind-style" feature plane input on my own, but I hadn't realized (as you did) that the policy vector needed to be flipped for black. @Zeta36 and I were wondering why it didn't converge. I'll be updating it accordingly as soon as possible.
Doesn't the DeepMind input actually use an extra plane to encode the side to move?
The main reason I did this was for fun, and also it might make the network train faster, as I believe it's strictly better than having the color plane and using this transformation to augment the training data.
Yes, they did. You can see my current approach here.
Here is another quick question. It appears that you clear the move tables at the beginning of each call to action
. Yet isn't this contrary to the DeepMind approach, where, as they say, after each move the non-chosen portion of the tree is discarded but the chosen one is kept? Here, we will have to build visit counts from scratch each time a new move is chosen. Previously, memory was released only at the end of the game (when self.white
and self.black
are reassigned in self_play.py
).
Also, what do you mean by "losing MCTS nodes...!"?
But why is there a need to flip the policy if you are feeding in the side to move? Yes that was before I read that part of the paper, but even then I'm not sure how move counts from previous transpositions would affect the table. In any case I'm mostly doing this as a "functional" approach to make results and bugs reproducible and make things easier to think about for now.
Hmm, I see what you're saying. But that'd be much harder for the network, no? The entire mapping from the convolved input stack to the policy vector will have to be re-learned from scratch for black, in a new way that is totally an arbitrary scrambling of the first. At that point, there is no reason to place the side-to-move on top of the stack, orient the board from the player's perspective, etc... Right?
That might be true especially at the beginning, before the model has the chance to learn the rules of chess.
However I think we are doing something similar with the flattened policy output layer anyway. Google's paper does mention that final result (between flattened policy and huge stack of one-hot layers) was the same but training was slightly slower with a "compressed" format, which for us with our 0 TPUs probably means we won't see significant results from scratch for a while.
One thing I considered doing is having two 64-unit FC outputs for to and from square (and maybe ignore underpromotions for now); it might be a little easier for the network to use. BUT I don't know if this would output a sensible probability distribution with regard to softmax and ranking chess moves.
By the way, do you know what the "alternative" to the flat approach is? I can't figure out what the "non-flat" approach they're referring to is.
Yeah I agree that's unclear. I don't even know how they came up with 4629 possible moves.
4672 comes from their 73x8x8 move representation, as described in the arxiv paper. They also mention that they tried various approaches, which all worked well.
Yeah my impression is anything we understand won't matter anyway :) All we can do is ensure the inputs and outputs are correct and pray for the best.
BTW are you able to access their nature paper? If not, I got it from my university and can send it to you if you want.
On line 21 of your player_chess.py
, you reference asyncio
despite having deleted the import. Is this intentional?
Try checking the new branch, I removed that part and optimized a lot of other stuff.
Didn't see that, thanks.
@benediamond sorry I didn't see your other comment. I think Python passes the reference to the [] to self.prediction queue so it's all good. You can uncomment the
#logger.debug(f"predicting {len(item_list)} items")
to verify for yourself.
Yes, Indeed I deleted it because I figured that out myself just after posting! Thanks.
By the way, I'm brainstorming a list of ways to fix the draws by repetition thing, hopefully we can figure this one out.
Hi @Akababa, the one thing that seemed to effect this most strongly for me was the change_tau_turn
. I would first try setting this value to a very large number (1000, etc.), so that tau
never drops.
I've also experimented with a slowly (exponentially) decaying tau
.
Using either of these two, I could essentially eliminate draws by repetition.
Thanks, if that works it's a much nicer solution than the stuff I came up with. Did you let tau=e^{-0.01*turn} ?
Yes, essentially. I replaced the parameter change_tau_turn
with tau_decay_rate
. 0.99 was a good value (very close to e^{-0.01} lol). Then set
tau = np.power(tau_decay_rate, turn)
My tensorflow was broken by the latest CUDA update, so it'll be a bit before I can get working again.
What version? I'm on cuda 8 and cudnn 6
I've got CUDA 9.1 and cuDNN 7.0.5. still no luck.
As my machine runs CUDA 9(.1), TF w/ GPU won't work out of the box. rather than attempt to downgrade, I just built from source. That proved to be a good idea, until recently.
As for speed, I'm not sure, but using the new versions couldn't hurt, I think.
Multithread is not parallel task in python due to GIL, is it?
No, unfortunately not :( But locks are still faster than asyncio with event loop.
@benediamond @Zeta36 After many hours of hopeless debugging I discovered locks which are amazing. The overall speedup on my machine is quite a lot, I would say at least 2x. That being said, I haven't tested it fully and the code is almost completely rewritten/refactored by now, so please feel free to use it and tell me if I missed anything :) https://github.com/Akababa/chess-alpha-zero/blob/opts/src/chess_zero/agent/player_chess.py
TODO: