byronknoll / cmix

cmix is a lossless data compression program aimed at optimizing compression ratio at the cost of high CPU/memory usage.
http://www.byronknoll.com/cmix.html
GNU General Public License v3.0
601 stars 44 forks source link

1-billion-word-language-modeling-benchmark-r13output.tar fails on Windows and WSL #45

Closed jabowery closed 4 years ago

jabowery commented 4 years ago

UPDATE 1/18/2020 The -n option seems to work, although after running for a day, progress is only at 0.77%. END UPDATE

This may be due to the 4,231,823,360 byte length of the tar file.

Here's the output from the WSL binary under an Ubuntu image:

$ ../../cmix/cmix -c 1-billion-word-language-modeling-benchmark-r13output.tar out.cmix
4231823360 bytes -> 6 bytes in 5.83 s.
cross entropy: 0.000
byronknoll commented 4 years ago

Unfortunately I think the max input size supported by cmix right now is 2147483648 bytes. I think your -n workaround also won't have the desired behavior (I suspect it won't compress the entire file).

Thanks for the bug report - I will try to fix this issue when I have time.

byronknoll commented 4 years ago

OK, the latest commit should fix this issue.