Each character in string counts as token

TiagoFINO commented 1 year ago

I tried to compress an array of float into a string so it would spend less tokens, but I ended up finding out it spends more tokens, and figured out that the bigger the string, the more tokens it spends. Is that supposed to happen? If it is, why?

ryanheath commented 1 year ago

Yes that’s expected. The longer the string the more space (thus tokens) is allocated.

Also one char uses 2 bytes, thus two tokens.

You might want to store your string as an array of (ascii) bytes instead. That will safe you from the two token per char penalty.

I do wonder though how you safe space by converting your array of floats into a string.

// Ryan

On Sat, 9 Sep 2023 at 13:46, TiagoFINO @.***> wrote:

I tried to compress an array of float into a string so it would spend less tokens, but I ended up finding out it spends more tokens, and figured out that the bigger the string, the more tokens it spends. Is that supposed to happen? If it is, why?

— Reply to this email directly, view it on GitHub https://github.com/SebLague/Chess-Challenge/issues/477, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFLIXXWCJFUMR7Y5GLTZK3XZRJIDANCNFSM6AAAAAA4RMRYD4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>

TiagoFINO commented 1 year ago

Well, I thought that one string would count one token, that is way I had the idea of a string thank you

mcthouacbb commented 1 year ago

I tried to compress an array of float into a string so it would spend less tokens, but I ended up finding out it spends more tokens, and figured out that the bigger the string, the more tokens it spends. Is that supposed to happen? If it is, why?

It's to discourage people from just copy-pasting well known implementations like stockfish, and also to make it impossible to fit infinite data into the 1024 token limit.

The current best way to compress is to pack data into the 96 bit mantissa of a decimal

SebLague / Chess-Challenge

Each character in string counts as token #477