Akababa / Chess-Zero

Chess reinforcement learning by AlphaZero methods.
MIT License
38 stars 14 forks source link

Finally multithreading! #3

Open Akababa opened 6 years ago

Akababa commented 6 years ago

@benediamond @Zeta36 After many hours of hopeless debugging I discovered locks which are amazing. The overall speedup on my machine is quite a lot, I would say at least 2x. That being said, I haven't tested it fully and the code is almost completely rewritten/refactored by now, so please feel free to use it and tell me if I missed anything :) https://github.com/Akababa/chess-alpha-zero/blob/opts/src/chess_zero/agent/player_chess.py

TODO:

benediamond commented 6 years ago

Hi @Akababa, this looks great, I'm going over it now.

In the mean time, I noticed your flip_policy step. Could you say more about this?

@Zeta36 I'm wondering if this is the crucial bug that appeared since the DeepMind-style board representation. When you flip the board to orient the features to the perspective of the current player... Then the final NN map onto the policy vector must be flipped back as well!? Furthermore it would be necessary to "preemptively" flip visit count information before feeding it into the neural network, e.g. during the convert_to_training_data step.

benediamond commented 6 years ago

Also, why did you remove the manual annealing of the learning rate?

Akababa commented 6 years ago

Hey @benediamond , thanks for commenting! Btw I'm pushing more optimizations to https://github.com/Akababa/chess-alpha-zero/blob/opts/src/chess_zero/agent/player_chess.py now. It looks like it's working well.

I didn't train for 100,000 steps anyway (the first time lr changes) so it doesn't really matter, I was just experimenting with different optimizers. On the scale of our testing Google's annealing isn't really applicable.

Akababa commented 6 years ago

yes, I think I did the flipping stuff correctly here, but would really appreciate if you could take a quick look to see if it checks out from your point of view.

If you have code you'd like a sanity check on too I'd be happy to help out :)

Zeta36 commented 6 years ago

@benediamond:

When you flip the board to orient the features to the perspective of the current player... Then the final NN map onto the policy vector must be flipped back as well!?

Well, this is really a problem we didn't take into account. Certainty it may be the cause of the converge failing.

benediamond commented 6 years ago

Yes. I can't believe we didn't think of this. @Akababa, kudos! I'll be looking through your code making sense of everything. I'll let you know how things work.

Akababa commented 6 years ago

Yeah that's always a worry in the back of my mind (hence the paranoid asserts). I'm a little confused by the conversation though, is there already a bug found in my code? Or is it a previous one before my implementation?

@benediamond Thank you! Please feel free to write some test cases and sanity checks.

benediamond commented 6 years ago

@Akababa The point is that I had developed a "DeepMind-style" feature plane input on my own, but I hadn't realized (as you did) that the policy vector needed to be flipped for black. @Zeta36 and I were wondering why it didn't converge. I'll be updating it accordingly as soon as possible.

Akababa commented 6 years ago

Doesn't the DeepMind input actually use an extra plane to encode the side to move?

The main reason I did this was for fun, and also it might make the network train faster, as I believe it's strictly better than having the color plane and using this transformation to augment the training data.

benediamond commented 6 years ago

Yes, they did. You can see my current approach here.

Here is another quick question. It appears that you clear the move tables at the beginning of each call to action. Yet isn't this contrary to the DeepMind approach, where, as they say, after each move the non-chosen portion of the tree is discarded but the chosen one is kept? Here, we will have to build visit counts from scratch each time a new move is chosen. Previously, memory was released only at the end of the game (when self.white and self.black are reassigned in self_play.py).

Also, what do you mean by "losing MCTS nodes...!"?

Akababa commented 6 years ago

But why is there a need to flip the policy if you are feeding in the side to move? Yes that was before I read that part of the paper, but even then I'm not sure how move counts from previous transpositions would affect the table. In any case I'm mostly doing this as a "functional" approach to make results and bugs reproducible and make things easier to think about for now.

benediamond commented 6 years ago

Hmm, I see what you're saying. But that'd be much harder for the network, no? The entire mapping from the convolved input stack to the policy vector will have to be re-learned from scratch for black, in a new way that is totally an arbitrary scrambling of the first. At that point, there is no reason to place the side-to-move on top of the stack, orient the board from the player's perspective, etc... Right?

Akababa commented 6 years ago

That might be true especially at the beginning, before the model has the chance to learn the rules of chess.

However I think we are doing something similar with the flattened policy output layer anyway. Google's paper does mention that final result (between flattened policy and huge stack of one-hot layers) was the same but training was slightly slower with a "compressed" format, which for us with our 0 TPUs probably means we won't see significant results from scratch for a while.

One thing I considered doing is having two 64-unit FC outputs for to and from square (and maybe ignore underpromotions for now); it might be a little easier for the network to use. BUT I don't know if this would output a sensible probability distribution with regard to softmax and ranking chess moves.

benediamond commented 6 years ago

By the way, do you know what the "alternative" to the flat approach is? I can't figure out what the "non-flat" approach they're referring to is.

Akababa commented 6 years ago

Yeah I agree that's unclear. I don't even know how they came up with 4629 possible moves.

benediamond commented 6 years ago

4672 comes from their 73x8x8 move representation, as described in the arxiv paper. They also mention that they tried various approaches, which all worked well.

Akababa commented 6 years ago

Yeah my impression is anything we understand won't matter anyway :) All we can do is ensure the inputs and outputs are correct and pray for the best.

BTW are you able to access their nature paper? If not, I got it from my university and can send it to you if you want.

benediamond commented 6 years ago

On line 21 of your player_chess.py, you reference asyncio despite having deleted the import. Is this intentional?

Akababa commented 6 years ago

Try checking the new branch, I removed that part and optimized a lot of other stuff.

benediamond commented 6 years ago

Didn't see that, thanks.

Akababa commented 6 years ago

@benediamond sorry I didn't see your other comment. I think Python passes the reference to the [] to self.prediction queue so it's all good. You can uncomment the #logger.debug(f"predicting {len(item_list)} items") to verify for yourself.

benediamond commented 6 years ago

Yes, Indeed I deleted it because I figured that out myself just after posting! Thanks.

Akababa commented 6 years ago

By the way, I'm brainstorming a list of ways to fix the draws by repetition thing, hopefully we can figure this one out.

benediamond commented 6 years ago

Hi @Akababa, the one thing that seemed to effect this most strongly for me was the change_tau_turn. I would first try setting this value to a very large number (1000, etc.), so that tau never drops.

I've also experimented with a slowly (exponentially) decaying tau.

Using either of these two, I could essentially eliminate draws by repetition.

Akababa commented 6 years ago

Thanks, if that works it's a much nicer solution than the stuff I came up with. Did you let tau=e^{-0.01*turn} ?

benediamond commented 6 years ago

Yes, essentially. I replaced the parameter change_tau_turn with tau_decay_rate. 0.99 was a good value (very close to e^{-0.01} lol). Then set

tau = np.power(tau_decay_rate, turn)
benediamond commented 6 years ago

My tensorflow was broken by the latest CUDA update, so it'll be a bit before I can get working again.

Akababa commented 6 years ago

What version? I'm on cuda 8 and cudnn 6

benediamond commented 6 years ago

I've got CUDA 9.1 and cuDNN 7.0.5. still no luck.

Akababa commented 6 years ago

I just used the old versions on the TF site. Are the new ones faster?

benediamond commented 6 years ago

As my machine runs CUDA 9(.1), TF w/ GPU won't work out of the box. rather than attempt to downgrade, I just built from source. That proved to be a good idea, until recently.

benediamond commented 6 years ago

As for speed, I'm not sure, but using the new versions couldn't hurt, I think.

apollo-time commented 6 years ago

Multithread is not parallel task in python due to GIL, is it?

Akababa commented 6 years ago

No, unfortunately not :( But locks are still faster than asyncio with event loop.