About the training and test games

google-deepmind / searchless_chess

Grandmaster-Level Chess Without Search

https://arxiv.org/abs/2402.04494

Apache License 2.0

470 stars 21 forks source link

About the training and test games #4

Closed johncs999 closed 3 months ago

johncs999 commented 3 months ago

Hi there, thanks for the interesting work.

I'm curious about the training and test games, as I noticed there are 98M games in the 2023-February split on lichess. Did you use the first 10M as training games and the following 1k games for testing?

anianruoss commented 3 months ago

We randomly sample 10M games from the February 2023 split from https://database.lichess.org. For testing, we use 1k games randomly sampled from March 2023 (see section 2.1 of our paper).

johncs999 commented 3 months ago

Thanks. BTW, how to obtain a subset of 10^4 games from the current data? For behavioral cloning, can I simply use the first 589,130 records?

anianruoss commented 3 months ago

We use Apache Beam to process the file in parallel. To select a fixed number of games, we use beam.combiners.Sample.FixedSizeGlobally(num_games).