SafeteeWoW / LibDeflate

Pure Lua compressor and decompressor with high compression ratio using DEFLATE/zlib format.
https://safeteewow.github.io/LibDeflate/
zlib License
85 stars 17 forks source link

Added gzip compression/decompression #2

Closed MCJack123 closed 3 years ago

MCJack123 commented 5 years ago

I added a basic LibDeflate:DecompressGzip function. Tested with gzip 1.6 and Lua 5.1. Uses some bit functions that are included, but this may affect code style.

SafeteeWoW commented 5 years ago

@MCJack123 Thanks. I have reviewed your code. Thank you for your contribution.

It is okay to introduce dependencies, as long as that is only needed for gzip functionality. Can you implement some of the following checklists when you have time? I currently have a busy work schedule, will help in my early May vocation.

Thanks for your contributions.

MCJack123 commented 5 years ago

I've added a compressor and fixed compliance, but something is causing an error in the tests, so I'm going to need to spend a bit more time debugging that.

SafeteeWoW commented 5 years ago

As long as most tests passed, it's okay. I'll fix failed tests. You can search "Fail" in the CI log to figure out which test is failing. Some tests I wrote are probably not needed. Some are workarounds in order to get 100% test coverage.

SafeteeWoW commented 5 years ago

Dont assume "os" and "io" module exists. LibDeflate is originally written for games embeded Lua, which strip away os and io module

SafeteeWoW commented 5 years ago

Can you give an instruction how to work with ComputerCraft,so I can test it?

SafeteeWoW commented 5 years ago

What is the size of your big files? LibDeflate is slow when work with large files. It should work for large files though. I did test 50M input data. But anyway, prefer compressor written in C with Lua binding when working on big files.

SafeteeWoW commented 5 years ago

LibDeflate gzip functionality should be tested against "gzip" See tests/Test.lua:444

Windows CI on Appveyor may not have "gzip" installed. "choco install gzip" may need to be added to .appveyor.yml

SafeteeWoW commented 5 years ago

I will work on your PR on May 1st, hopefully get a Beta version on May 2nd.

MCJack123 commented 5 years ago

I've added a compressor and fixed compliance, but something is causing an error in the tests, so I'm going to need to spend a bit more time debugging that.

What is the size of your big files? LibDeflate is slow when work with large files. It should work for large files though. I did test 50M input data. But anyway, prefer compressor written in C with Lua binding when working on big files.

I need to be able to compress 700k+ files in ComputerCraft, which is pure Lua. ComputerCraft will halt the Lua computer if it runs for 7 seconds without calling os.pullEvent(). This is to prevent infinite loops in programs, but can cause problems for programs that require long processing.

SafeteeWoW commented 5 years ago

Compress file with 700k size, even on compression level 9, shouldn't take 7s. If long processing causes the problem, you can split the file data into multiple parts and call LibDeflate:Compress multiple times and then concat the result

SafeteeWoW commented 5 years ago

btw, your code is failing the static analyzer check in CI, which is run at the beginning of the CI. Also, I have replied some of your comments above.

SafeteeWoW commented 5 years ago

I have pushed several changes into the develop branch of this repository.

SafeteeWoW commented 5 years ago

I made a mistake. Computercraft uses LuaJ as the interpreter. Not Luajit. CI will be added to LuaJ and the ComputerCraft Fork of LuaJ.

SafeteeWoW commented 5 years ago
  1. I will test more Lua interpreter implementations and try to add workarounds if possible. There is a bug in LuaJ 2.0.3 that modifying table during traversal is not possible. I will add work around for it.

  2. I decide to not merge code that contains ComputerCraft specific APIs because I have never played Minecraft before and I cannot test those features in CI. Also it increases the risk of naming conflict with other lua interpreter because ComputerCraft API does not use special name for its API, and it's possible that ComputerCraft users write those code outside of LibDeflate without modifying LibDeflate.lua.

  3. My current plans is to add tests for another 5 Lua interpreters in CI on the current develop branch, when passed, start to merge your code.

MCJack123 commented 5 years ago

I added gzip -t -v to the gzip tests to print info about the compressed gzip. Should I keep this for debugging, or remove it?

SafeteeWoW commented 5 years ago

gzip -t -v should be okay, unless it makes the test log bigger than the limit of TravisCi or Appveyor.

SafeteeWoW commented 5 years ago

You can convert math.floor(x) to (x-x%1), then LibDeflate will no longer require a Lua interpreter which implements the "math" module.

SafeteeWoW commented 5 years ago

Have you pass all tests locally? I suggest to pass tests locally first in LuaJIT.

SafeteeWoW commented 5 years ago

CI is at least 5 times slower than the current develop branch. I guess it is caused by the CRC32. The current dev branch runs CI in 13min. This PR has exceed 50min limitation of Travis CI.

One solution is to disable gzip tests in original Lua, only enable it in LuaJIT.

SafeteeWoW commented 5 years ago

I am trying to found if there is any room to optimize CRC32 without installing any "bit" module

SafeteeWoW commented 5 years ago

Made a faster Pure Lua implementation of CRC32, without using any external library. It is faster than level 1 LibDeflate compression with the same input size.

There is still room to optimize. But now I think it's safe to remove optional "bit" or "bit32" dependencies, because it only gives 2 times speedup, for crc32 only.

It takes 200ms to generate lookup table though, in the classic Lua5.1 interpreter. It uses 64K crc32 lookup table instead of 256. NOTE: In LibDeflate, lookup table should only be generated when LibDeflate:Crc32 is called for the first time. Shouldn't generate it when LibDeflate is imported.

Also, this should solve the ComputerCraft specific problem that compressing file with size of 700k takes more than 7s.

https://gist.github.com/SafeteeWoW/080e784e5ebfda42cad486c58e6d26e4

Also, bit.bxor return signed int32 value instead of uint32. I think it is the reason why CI is reporting invalid crc32. To convert to uint32, mod the result by 4294967296 (2^32)

SafeteeWoW commented 5 years ago

This project uses "TAB" as the indention, not "4 spaces". This is to follow Blizzard style, because this project is originally written as the addon to World of Warcraft.

SafeteeWoW commented 5 years ago

The Lua used in ComputerCraft, LuaJ 2.0.3 is more than10 times slower than classic Lua interpreter. I suggest you to use LibDeflate to work than files less than 200k. Compression for 100k files is already over 10s.

codecov[bot] commented 5 years ago

Codecov Report

Merging #2 into master will decrease coverage by 1.29%. The diff coverage is 87.09%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master      #2     +/-   ##
========================================
- Coverage     100%   98.7%   -1.3%     
========================================
  Files           1       1             
  Lines        1709    1857    +148     
========================================
+ Hits         1709    1833    +124     
- Misses          0      24     +24
Impacted Files Coverage Δ
LibDeflate.lua 98.7% <87.09%> (-1.3%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 07f9d36...b053929. Read the comment docs.

SafeteeWoW commented 5 years ago

If you want to increase code coverage:

Because Lua tests with code coverage tracking enabled is very slow, code coverage is only enabled with LuaJIT. And only subset of tests which can improve code coverage are enabled in CodeCoverage.

See AddToCoverageTest() function in tests/Test.lua

SafeteeWoW commented 5 years ago

https://gist.github.com/SafeteeWoW/080e784e5ebfda42cad486c58e6d26e4

My version of crc32 updated. Switch back to 256 elements crc32 table to solve long time of caching.

Gonna start to merge your code now. Currently working on "test_more_interpreter" branch of this repo and will merge your code to there.

SafeteeWoW commented 5 years ago

Your changes have been merge in test_more_interpreter.

However, all codes to work around ComputerCraft / LuaJ 2.0.3 have been removed. You are welcome to maintain a fork of LibDeflate to make it work for ComputerCraft.

This is due to the following reasons:

  1. LibDeflate running on LuaJ 2.0.3 is more than 10 times slower than classic Lua 5.1.5 interpreter, slowing down CI and local testing process.
  2. It's Lua table implementation is buggy. Don't want to fix it when most Lua interpreter implementations I tried these days has correct Lua table implementation
  3. I have never played Minecraft/ComputerCraft before. I cannot ensure the workarounds for ComputerCraft really make LibDeflate work on ComputerCraft.
SafeteeWoW commented 5 years ago

Making some relative big changes with code style fixes on test_more_interpreter branch

SafeteeWoW commented 5 years ago

I am considering to add a streaming interface, which is similar to Zlib. This will allow you to compress big files with multiple separate function calls to LibDeflate with an arbitrary size of small chunk of input data, so you can os.pullEvent() or give control to other program between these function calls. Decompression will also provide similar streaming interface.

It basically looks like this:

state = DeflateInit(configs)
Deflate(input_chunk0, state)
-- Do whatever you want between these function calls.
Deflate(input_chunk1, state)
Deflate(input_last_chunk, state)
compressed_data = DeflateEnd(state)
SafeteeWoW commented 3 years ago

Close for now. May add gzip function later