SebLague / Chess-Challenge

Create your own tiny chess bot!
https://www.youtube.com/watch?v=Ne40a5LkK6A
MIT License
1.78k stars 1.06k forks source link

Machine Learning Not Possible (Token Capacity) #111

Open MohammadHomsee opened 1 year ago

MohammadHomsee commented 1 year ago

I would like to try machine learning attempt, how can I store learning data ?

plastic-bottle commented 1 year ago

you aren't supposed to do full-scale machine learning - the token limit is specifically there to prevent this

ejmejm commented 1 year ago

It is possible to do some machine learning, as a little hint, you'll probably want to store your network in a non-numeric format. You could think about how you could represent a small NN as a string, for example. At least that is if you do it the way I am going with, but I'd imagine my way is not the only way.

Odin7094 commented 1 year ago

There is a 1000 games to be played. I could imagine someone letting it learn on the go with some smart approach.

Infiniti20 commented 1 year ago

It is possible to do some machine learning, as a little hint, you'll probably want to store your network in a non-numeric format. You could think about how you could represent a small NN as a string, for example. At least that is if you do it the way I am going with, but I'd imagine my way is not the only way.

Every character in a string is counted as a token.

ejmejm commented 1 year ago

It is possible to do some machine learning, as a little hint, you'll probably want to store your network in a non-numeric format. You could think about how you could represent a small NN as a string, for example. At least that is if you do it the way I am going with, but I'd imagine my way is not the only way.

Every character in a string is counted as a token.

It is still possible with small models

ch-iv commented 1 year ago

I do not think that the token rule was made specifically to prevent the ML approach. ML is an allowed option (you can even see ML being mentioned as one of the possible approaches on the submission form). That being said, the token constraints make it difficult to use ML

Firestorm-253 commented 1 year ago

It is possible to do some machine learning, as a little hint, you'll probably want to store your network in a non-numeric format. You could think about how you could represent a small NN as a string, for example. At least that is if you do it the way I am going with, but I'd imagine my way is not the only way.

Sry, but don't even try it hahah. If u have any experience in RL u'll know that it's hard enough to make a decent chess AI without having a token limit.

zzzzz151 commented 1 year ago

You can train your NN outside, then copy paste the weights in MyBot.cs

Firestorm-253 commented 1 year ago

You can train your NN outside, then copy paste the weights in MyBot.cs

even then, i don't think u can do it. prove me wrong.

MohammadHomsee commented 1 year ago

Imagine we have neural network with 64 inputs, and 10 hidden, and one output, the weights count for this will be 640 + 10 = 650 numbers, even if I store this in one string, and even if managed to keep it in token limit, The funny part that after all of this hard work, I won't be able to fit my code logic in the script. I think the only solution for this is to allow external text file for training data, but now I am seeing that if he allow it, what prevent somebody from copy pasting already trained models data.

I just understand why Sabastian prevented nameof, because one method come in my mind is to have very long variable name, in this variable name I can store every thing I need, remember variable name is considered to be one token, by doing that I can get this name as string and extract the data I want, but seb noticed it and prevent it, I like how smart he is.

Odin7094 commented 1 year ago

To everyone thinking about strings, if I recall correctly every character counts as one token, so just remember to check it out before implementing entire thing.

Pds314 commented 1 year ago

You can theoretically smoosh a lot of information into 64-bit uint64s or 128-bit decimals which are primitive types and a literal of them takes 1 token (which is the same as a single 16-bit UTF-16 character) How many weights or biases you can put in a decimal or uint64 depends how microscopic you're willing to make your weights or biases.

System.numerics.BigInteger does not work as it is not a primitive type so there doesn't appear to cram information into it without involving a string literal or a smaller numeric type.

Decimals are definitely bigger than ulongs (they're 128 bits) but they are also highly annoying to deal with. The practical upshot is that you can fairly easily squeeze 96 bits out of them, and might be able to squeeze a little bit more out of the sign and the decimal point location if its worth it, but don't expect to ACTUALLY store 128 bits into a decimal. You might get 100 if you really push it to the limit, and you can't use hexadecimal literals that make it easy to visually see the bits either. I think that comfortably storing 50% more information is probably worth it for compressing a large neural network or even just a bunch of piece square tables, but don't imagine you're getting 2 ulongs worth or that the decoding process will be as smooth as it is for ulongs.

Pds314 commented 1 year ago

You can train your NN outside, then copy paste the weights in MyBot.cs

You can theoretically do it yes. The question is can you actually come to with an efficient, powerful, and trainable architecture that can be stored and then decompressed in 1024 tokens. Like if we assume 768 tokens are used to store the weights and biases using uint64s and the other 256 tokens is needed to unpack, run, and search, that gives us 49152 bits to play with.

Pds314 commented 1 year ago

You could use a relatively sparse network structure and have each connection store a sort of relative address of the node to connect to and train by neuroevolution.

Pds314 commented 1 year ago

Honestly I think if I go the machinelearning route, what I'm gonna do is:

  1. Use decimals in production but ulongs in testing for compacting the information.
  2. Use very very low fidelity weights and biases on the connections. Maybe even single bit.
  3. Don't overconnect the network. Right, if you have 256 inputs and 256 fully-connected nodes on the next hidden layer, you already have 65536 connections, which is 2/3rds of your total budget even using decimals, even with single bit weights.
  4. Instead, we want to have more layers and more nodes but fewer connections. For example, you might have those 256 nodes in the input layer connect to a 256 node second layer, but it's literally 2 inputs per node. So that's 512 connections. Then make 7 more hidden layers with nodes of that size so that on the 8th hidden layer, every input node affects every node in the layer. You now have achieved a similar result to the previous 65536 connections with just 4096 and some clever bit masking. At this point you can begin tightening down the network and increasing the connectivity of the layers. So perhaps 8 more layers sized 128, 64, 32, 16, 8, 4, 2, 1, with 512 connections per layer using the same strategic bitmasking approach but increasing the number of connections per node, but once we've necked down from 32 to 16, our network can be fully connected beyond that. This structure results in 12*512 + 170 = 6314 connections that need weights. If those weights are 8-bit numbers, that's about 50512 bits of information which can be stored easily in 527 decimal tokens without trying to use the sign bits of the decimals.
  5. Possibly train using a method that doesn't require high fidelity gradient descent. For example, a genetic algorithm with crossbreeding (randomly combining the bits of a player who won as black and a player who won as white, I think it makes sense to do it that way) If I do that I'm probably going to use single bit or 2-bit weights and a much larger network, at least if training it doesn't prove impossible.
NegaScout commented 1 year ago

Neural networks are not even a good solution to a chess game (if used alone). Reinforcement learning is much better for this use. I would recommend studying the AlphaZero architecture (which internaly uses NN, but their are not the main point of it), which is based on MCTS and UCT rollouts.