Thytu / StockLLM

Elevating Chess Strategy with Fine-Tuned Large Language Model
MIT License
3 stars 0 forks source link

Suggestion: augment pre-training with lichess open database #5

Open linux-leo opened 8 months ago

linux-leo commented 8 months ago

See: https://database.lichess.org/#standard_games

Maybe use every nth game from the year 2013 before lichess grew in size, so the dataset covers a more or less equal amount of games per month while still covering a large time span, and to reduce the amount of games that need to be processed.

PS: I'm happy to provide some compute for this project with my google colab pro+ Subscription :)

Thytu commented 8 months ago

Hey @linux-leo,

Awesome to see your interest in the project! Just got back from a travel trip, so I'm catching up. Appreciate your suggestions on improving the training dataset, you've got some great points there.

To add few possible improvement:

And your offer for compute power? Legendary! I managed to have access to an H100, so we should be golden for now, still, thanks a bunch for having my back :)

Feel free to drop more thoughts whenever they pop into your head. 🚀