Leap years are not yet calculated correctly

hackerb9 commented 2 years ago

Both 02/29/2024 and 03/01/2024 choose the 60th word in WL2024. The 366th word is only available by setting the date to 12/32/2024.

bgri commented 2 years ago

Ah, interesting. I'll dig into this, thanks! It may be a little bit as I've recently moved and some of my gear is still packed :)

hackerb9 commented 2 years ago

I've fixed this and several other bugs in one of my branches. I'll send a pull request once I'm happy with it. https://github.com/bgri/m100LE/compare/main...hackerb9:M100LE:aesthetics. The main thing it is missing right now is slowly flipping over the letters when you win to give a sense of anticipation before the word "CONGRATS!!"

bgri commented 2 years ago

Very nice! Thanks for looking at this!

Heh, yeah, I think once I get settled in and get my gear out of packing boxes I'm going to dig into this code again. I like the idea of 'more anticipation' for the 'CONGRATS' message.

Also, just saw a video on Run Length Encoding and was wondering if there's a way to save space on the WordList files. If I can make them smaller, I can fit more 'years' in a data file, or just reduce the size so the unit can have more things on it :) . Something to think about...

On Sat, Jul 23, 2022 at 6:00 AM hackerb9 @.***> wrote:

I've fixed this and several other bugs in one of my branches. I'll send a pull request once I'm happy with it. main...hackerb9:M100LE:aesthetics https://github.com/bgri/m100LE/compare/main...hackerb9:M100LE:aesthetics. The main thing it is missing right now is slowly flipping over the letters when you win to give a sense of anticipation before the word "CONGRATS!!"

— Reply to this email directly, view it on GitHub https://github.com/bgri/m100LE/issues/6#issuecomment-1193113878, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLJII64YINHPADOSELPAN3VVPND7ANCNFSM54JXSM2Q . You are receiving this because you commented.Message ID: @.***>

-- -- Brad Grier

hackerb9 commented 2 years ago

I did not get the anticipation code done that I had intended — perhaps you'll get a chance to tackle it once you unpack — but I did at least make it show that the word matches before it says CONGRATS and pauses. I'll file a pull request.

RLE won't work for encoding English — the fact that you've seen one letter doesn't make it more likely that you'll see more of the same letter immediately following — but English does have a lot of redundancy. Just to see what the limit is on compression, I tried using gzip and found the WL files went from 24KB to 7.2KB.

There's lot of techniques you could implement on a Model 100. The most basic would be a "dictionary" based scheme which represents common sequences (like "th" and "ea") using character codes > 128.

One obvious waste in the file is the CR and NL at the end of each word. Since you know that every word is exactly 5 characters, you could omit them and have a file that's 28% smaller. I do not know how to seek in binary files on the Model 100, but it may be possible to.

Another space saving could be had by noticing that each word is made up of only 26 letters, which can be represented using only 5-bits. Since there are five letters, each word requires only 25 bits, not the 40 bits used by five bytes.

The furthest you may want to go on this is Huffman coding which creates a variable length code, where the number of bits required is less for letters that are more frequent. Note that while a tree data structure is required for generating the code, the Model 100 will not need anything so fancy to decode it. However, it would have to process every single bit to count characters, which would likely be unusably slow on a Model 100.

There's a lot that one could do, but I actually like that the Word Lists are simple text files. And having an entire years worth of games per file isn't too shabby.

bgri commented 2 years ago

WHOA! You've been busy, thanks! I found the boxes that have the m100 and 8201s. I'll likely get them out this week and give your code a run and merge the changes. Then copy that over to my DEV stream and see what I can do with CONGRATS.

One obvious waste in the file is the CR and NL at the end of each word.

Since you know that every word is exactly 5 characters, you could omit them and have a file that's 28% smaller. I do not know how to seek in binary files on the Model 100, but it may be possible to.

I think I looked at that and disregarded it for some reason. I'll have to check my notes. Maybe time to scan 5 chr. chunks was too long to be practical? I forget but worth revisiting just to find out what I thought at the time.

And yeah, as long as you're running a REX or some such it's trivial to keep m100le available without impacting other programs.

Thanks again for your work on this, it's appreciated!

--Brad

On Sun, Jul 24, 2022 at 6:34 AM hackerb9 @.***> wrote:

I did not get the anticipation code done that I had intended — perhaps you'll get a chance to tackle it once you unpack — but I did at least make it show that the word matches before it says CONGRATS and pauses. I'll file a pull request.

RLE won't work for encoding English — the fact that you've seen one letter doesn't make it more likely that you'll see more of the same letter immediately following — but English does have a lot of redundancy. Just to see what the limit is on compression, I tried using gzip and found the WL files went from 24KB to 7.2KB.

There's lot of techniques you could implement on a Model 100. The most basic would be a "dictionary" based scheme which represents common sequences (like "th" and "ea") using character codes > 128.

One obvious waste in the file is the CR and NL at the end of each word. Since you know that every word is exactly 5 characters, you could omit them and have a file that's 28% smaller. I do not know how to seek in binary files on the Model 100, but it may be possible to.

Another space saving could be had by noticing that each word is made up of only 26 letters, which can be represented using only 5-bits. Since there are five letters, each word requires only 25 bits, not the 40 bits used by five bytes.

The furthest you may want to go on this is Huffman coding https://en.wikipedia.org/wiki/Huffman_coding which creates a variable length code, where the number of bits required is less for letters that are more frequent. Note that while a tree data structure is required for generating the code, the Model 100 will not need anything so fancy to decode it. However, it would have to process every single bit to count characters, which would likely be unusably slow on a Model 100.

There's a lot that one could do, but I actually like that the Word Lists are simple text files. And having an entire years worth of games per file isn't too shabby.

— Reply to this email directly, view it on GitHub https://github.com/bgri/m100LE/issues/6#issuecomment-1193309413, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLJII4GH3KGEXTNHAU4U5DVVUZ4HANCNFSM54JXSM2Q . You are receiving this because you commented.Message ID: @.***>

-- -- Brad Grier

hackerb9 commented 2 years ago

Since you know that every word is exactly 5 characters, you could omit them and have a file that's 28% smaller. > I think I looked at that and disregarded it for some reason. I'll have to check my notes. Maybe time to scan 5 chr. chunks was too long to be practical?

I don't know if M100 BASIC can do it, but I was imagining something like fseek() to skip directly to the correct entry: Today is the 206th day of the year, so you'd seek to byte number 1025 (205×5).

But, as I think more about it, I actually like that the word lists are just plain text files. It makes the program easier for other people in the future to understand and extend. And there is a beauty to the fact that the lists are easily editable on a Model 100.

hackerb9 commented 2 years ago

Since this bug will be closed once you merge the pull request, I'm going to create new issues for the topics discussed here.

bgri commented 2 years ago

Good idea.

On Sun, Aug 7, 2022 at 9:14 PM hackerb9 @.***> wrote:

Since this bug will be closed once you merge the pull request, I'm going to create new issues for the topics discussed here.

— Reply to this email directly, view it on GitHub https://github.com/bgri/m100LE/issues/6#issuecomment-1207604882, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLJII3AL5L7CFRFMXUSJDTVYB3QRANCNFSM54JXSM2Q . You are receiving this because you commented.Message ID: @.***>

bgri / m100LE