Add asynchronous processing option via coroutines

oratory commented 3 years ago

This adds an async=true config option to coroutine.yield() in the compress and decompress functions. This removes or reduces framerate loss while processing large strings.

SafeteeWoW commented 3 years ago

coroutine.yield() is not preferred solution.

I would like a DecompressBegin/DecompressContinue/DecompressEnd style API. So the decompress could be done by chunk.

SafeteeWoW commented 3 years ago

I agree coroutine is an easy way. If you can provide me an example that a WoW addon really need this feature, and add some tests in tests/Test.lua, which covers a new code in LibDeflate.lua, and pass the CI, without affecting backward compatibility, I may merge this.

SafeteeWoW commented 3 years ago

For completeness, add coroutine support for all Compress/Decompress APIs

SafeteeWoW commented 3 years ago

I guess this is for WeakAuras during Import? How much framerate improvement by adding this feature?

oratory commented 3 years ago

I guess this is for WeakAuras during Import?

WeakAuras might use it but I primarily made these changes for a WagoLib I'm making to easily let any addon add Wago.io support.

How much framerate improvement by adding this feature?

This isn't exactly comprehensive but here are in-game results with a fairly large 3.5MB table test:

Async

Serialize Time: 0.931 seconds Compress Time: 14.345 seconds Average 7.7 FPS over 15.276 seconds

Sync

All one frame so GetTime() does not update, but with a stopwatch I timed the combined serialize+compress process to 15.17 seconds. Game is locked for 15.17 seconds

rossnichols commented 3 years ago

@oratory if you want to measure performance of the synchronous version, take a look at debugprofilestart() and debugprofilestop() APIs.

SafeteeWoW commented 3 years ago

@oratory I plan to make a new version this week to merge your change. Need to add some tests for code coverage

oratory commented 3 years ago

Great I'll make some updates and the tests today or tomorrow.

oratory commented 3 years ago

I've cleaned up the lib and removed the wow-specific stuff. Also added in a handful async tests and all are working.

SafeteeWoW commented 3 years ago

No time to make a release this week. Hopefully next week.
I think the name "async" is not an appropriate name, especially after your most recent change. Maybe "partitioned"? You decide it, but "async" is misleading

SafeteeWoW commented 3 years ago

Please help to test if "async" option significantly decrease compression/decompression speed, in WoW. If that's the case, you may need to add "chunk_size" in config. I am unable to test in game at the moment

SafeteeWoW commented 3 years ago

Will change tab indentation to 4space indentation later. But for this PR, please remove all format only line of changes

P.S. This lib uses tab indent because WoW official Lua code uses tab when this lib first releases

SafeteeWoW commented 3 years ago

If you add new public API, modify TestExported in Test.lua This is why CI fails

oratory commented 3 years ago

Please help to test if "async" option significantly decrease compression/decompression speed, in WoW. If that's the case, you may need to add "chunk_size" in config.

Will do. I think a "max_time_spent_per_chunk" might be better in this case but I suppose I could add both options.

SafeteeWoW commented 3 years ago

I don't like max_time_spent_per_chunk, which involves time measurement, which is not stable.

SafeteeWoW commented 3 years ago

Squash your commits into single commit, and use git to remove all format-only lines of changes.

oratory commented 3 years ago

I changed things around a bit. Instead of being a typical async function, I'm calling it Chunks Mode which returns a handler which gets called on repeat until processing is finished.

Some rough benchmarks

Seralize & Compress

Small table: 115 Bytes	Standard	Chunks Mode
Serialize: 0.21 ms	Serialize: 1.19 ms
Compress: 1.30 ms	Compress: 10.59 ms
Total: 1.51 ms	Total: 11.78 ms

Medium Table: 163584 Bytes	Standard	Chunks Mode
Serialize: 10 ms	Serialize: 12 ms
Compress: 579 ms	Compress: 674 ms
Total: 589 ms	Total: 686 ms

Large Table: 1750073 Bytes	Standard	Chunks Mode
Serialize: 560 ms	Serialize: 609 ms
Compress: 7578 ms	Compress: 8323 ms
Total: 8128 ms	Total: 8932 ms

X-Large Table: 3511073 Bytes	Standard	Chunks Mode
Serialize: 1130 ms	Serialize: 1223 ms
Compress: 16489 ms	Compress: 16700 ms
Total: 17589 ms	Total: 17923 ms

Decompress and Deserialize

Small table: 115 Bytes	Standard	Chunks Mode
Decompress: 0.24 ms	Decompress: 9 ms
Deserialize: 0.10 ms	Deserialize: 11 ms
Total: 0.34 ms	Total: 20 ms

Medium Table: 163584 Bytes	Standard	Chunks Mode
Decompress: 192 ms	Decompress: 264 ms
Deserialize: 16 ms	Deserialize: 27 ms
Total: 208 ms	Total: 291 ms

Large Table: 1750073 Bytes	Standard	Chunks Mode
Decompress: 3094 ms	Decompress: 3668 ms
Deserialize: 574 ms	Deserialize: 772 ms
Total: 3668 ms	Total: 4440 ms

X-Large Table: 3511073 Bytes	Standard	Chunks Mode
Decompress: 6308 ms	Decompress: 7465 ms
Deserialize: 1074 ms	Deserialize: 1536 ms
Total: 7382 ms	Total: 9001 ms

Chunks Mode does have a small increase in processing time but has the benefit (in WoW and likely other environments) of not locking up the game and making the OS think it's non-responsive when processing large data.

Similar changes made on my LibSerialize PR. https://github.com/rossnichols/LibSerialize/pull/4

SafeteeWoW commented 3 years ago

Thanks. I get vocation on May 1-5. Will merge your PR during that period

SafeteeWoW commented 3 years ago

Reworked CI using Github workflows. Much faster than Travis and appveyor
Read the comments in rossnichols/LibSerialize#4. Should add separate API for chunk mode because it is fundamentally different vs default mode. Plan to work on this during this weekend.

SafeteeWoW commented 3 years ago

I would keep the chunk_size option, but remove chunk_time option. It's better for the user to measure time outside of this library

rossnichols commented 3 years ago

I would keep the chunk_size option, but remove chunk_time option. It's better for the user to measure time outside of this library

I had asked him to add it for LibSerialize - conceptually, yielding by size/count doesn't achieve the goal of the feature, which is to minimize impact on frame rate. With size/count, a given operation will always be divided into the same number of chunks, whereas a time based approach allows for fluctuation based on CPU performance.

Maybe they can be combined into a single option though? The user supplies a function "shouldYield(count)" and can choose to implement it as "count > threshold" or "elapsed time > threshold". The library resets the count and yields when the function returns true.

SafeteeWoW commented 3 years ago

I do not want to measure time in this library, which is not stable.
I don't think the library needs to measure the time. User can do this

The example in README of this PR

-- In WoW, coroutine processing can be handled on frame updates to reduce loss of framerate.
local processing = CreateFrame('Frame')
local WoW_compress_co = LibDeflate:CompressDeflate(example_input, {level=9, chunksMode=true})
processing:SetScript('OnUpdate', function()
  local ongoing, WoW_compressed = WoW_compress_co()
  if not ongoing then
    -- do something with `WoW_compressed`
    processing:SetScript('OnUpdate', nil)
  end
end)

If time is a concern, change the code to

local processing = CreateFrame('Frame')
local WoW_compress_co = LibDeflate:CompressDeflate(example_input, {level=9, chunksMode=true})
time_threshold=1.0/120
processing:SetScript('OnUpdate', function()
  local ongoing, WoW_compressed
  start = os.clock()
  while (not ongoing) and (os.clock() - start < time_threshold) do
    ongoing, WoW_compressed = WoW_compress_co()
  end
  if not ongoing then
    -- do something with `WoW_compressed`
    processing:SetScript('OnUpdate', nil)
  end
end)

SafeteeWoW commented 3 years ago

time-based chunk will only be provided if it is much better than the alternative of size-based chunk in a while loop, as I show above

SafeteeWoW commented 3 years ago

Will decide whether to add time option after I have completed code and done a benchmark

SafeteeWoW commented 3 years ago

A handler function would be better than a time option

configs = { chunk_size -- integer chunk_yield_handler --function to control whether to yield. nil to use the default. chunk_yield_handler_userdata --user provided data to handler }

The prototype of chunk_yield_handler is:

function handler(userdata, libdata)
-- userdata is user provided data in configs.chunk_yield_handler_userdata
-- libdata is a table. Currently only contains one field: num_bytes_processed
-- return true to yield
-- return false to not yield
end

The default handler return true for every chunk_size

SafeteeWoW commented 3 years ago

I can see that coroutine yield may be costly, so I will consider the handler approach.

rossnichols commented 3 years ago

I wasn't suggesting that the library itself measure time - that's why I said that the user-supplied handler is the entity that measures elapsed time. Your example code above doesn't actually help time measurement, because it's WoW_compress_co() that needs its execution to be time-gated, not the code that calls it (unless you just specify a super small chunk size and yield much more often, returning control to the outer handler, but that'll increase the coroutine overhead).

I agree that a handler-based approach seems the most flexible and allows the library code to be generic. I don't necessarily see a value to returning nil to use the "default" - rather, just have the library provide a default handler that yields e.g. based on the number of bytes. Then you can specify config to either:

Provide a handler that yields based on completely different criteria (e.g. time), or
Provide an updated default byte count that the default handler uses when deciding to yield.

SafeteeWoW commented 3 years ago

Recently very busy and no time to work on this. Hopefully I can work on this on this weekend.

SafeteeWoW commented 2 years ago

Currently not maintaining. Will pick up if I would like to

SafeteeWoW / LibDeflate