EmbroidePy / pyembroidery

pyembroidery library for reading and writing a variety of embroidery formats.
MIT License
181 stars 33 forks source link

Hus Writer #80

Open tatarize opened 4 years ago

tatarize commented 4 years ago

I write the HUS reader after reimplementing what is basically ARJ compression. There are likely a few cheats to perform the compression operations the easiest of which is likely to write that the first 256 elements are 8 bits long and the last 256 characters are never used. Then every 8 bits will equal a character equal to the byte. We can then just dump the uncompressed data and it will be sent through the decompressor unscathed. The alternative is actually searching the 1k window and building the actual frequency trees etc. Having studied the code to build mine up from nothing. This should have no issues since the decompression routine is unable to care and only checks for tree validity.

tatarize commented 4 years ago

For the compression I don't want to do the massive legwork of writing a real scheme so I've developed a special way to cheat. We write a fake table of 256 8 bit long entries. They will natively number lowest first in a tie so they will exactly map to be equal to the character themselves.

Block writing.

16 bits filesize

Distance Huffman. (we are never using this). 5 bits: 00000 No entries. 5 bit: 00000 Any value doesn't matter. Since we're never using it. However!

45 % 8 = 5 16 + 10 + 9 + 10 = 45 bits. And we'd have to offset all of our bytes by 3 bits.

So we need a huffman table bit length that is exactly mod 8. 5 bit: 00001 1 distance huffman. 3 bit: 111 we have a variable length value 7 or more. 5 bit: 11110 we add 4 to this 7 for 11. Because really we want to pad bits.

(16) + (10) + (9) + (5 + 3 + 5) = 48 bits. 6 bytes. 2 bytes filesize. Followed by 4 bytes: 0b00000010101000000000000111111110 This is storable in a single integer: 0x02A001FE

So to perform Greenleaf or ARJ compression.

2-byte-block-size + 0x02A001FE + uncompressed_stream

If you need more than 2^16 bytes, you'll need more blocks. You make them the same way.