Closed oratory closed 2 years ago
coroutine.yield() is not preferred solution.
I would like a DecompressBegin/DecompressContinue/DecompressEnd style API. So the decompress could be done by chunk.
I agree coroutine is an easy way. If you can provide me an example that a WoW addon really need this feature, and add some tests in tests/Test.lua, which covers a new code in LibDeflate.lua, and pass the CI, without affecting backward compatibility, I may merge this.
For completeness, add coroutine support for all Compress/Decompress APIs
I guess this is for WeakAuras during Import? How much framerate improvement by adding this feature?
I guess this is for WeakAuras during Import?
WeakAuras might use it but I primarily made these changes for a WagoLib I'm making to easily let any addon add Wago.io support.
How much framerate improvement by adding this feature?
This isn't exactly comprehensive but here are in-game results with a fairly large 3.5MB table test:
Serialize Time: 0.931 seconds Compress Time: 14.345 seconds Average 7.7 FPS over 15.276 seconds
All one frame so GetTime() does not update, but with a stopwatch I timed the combined serialize+compress process to 15.17 seconds. Game is locked for 15.17 seconds
@oratory if you want to measure performance of the synchronous version, take a look at debugprofilestart()
and debugprofilestop()
APIs.
@oratory I plan to make a new version this week to merge your change. Need to add some tests for code coverage
Great I'll make some updates and the tests today or tomorrow.
I've cleaned up the lib and removed the wow-specific stuff. Also added in a handful async tests and all are working.
Please help to test if "async" option significantly decrease compression/decompression speed, in WoW. If that's the case, you may need to add "chunk_size" in config. I am unable to test in game at the moment
Will change tab indentation to 4space indentation later. But for this PR, please remove all format only line of changes
P.S. This lib uses tab indent because WoW official Lua code uses tab when this lib first releases
If you add new public API, modify TestExported
in Test.lua
This is why CI fails
Please help to test if "async" option significantly decrease compression/decompression speed, in WoW. If that's the case, you may need to add "chunk_size" in config.
Will do. I think a "max_time_spent_per_chunk" might be better in this case but I suppose I could add both options.
I don't like max_time_spent_per_chunk, which involves time measurement, which is not stable.
Squash your commits into single commit, and use git to remove all format-only lines of changes.
I changed things around a bit. Instead of being a typical async function, I'm calling it Chunks Mode which returns a handler which gets called on repeat until processing is finished.
Some rough benchmarks
Small table: 115 Bytes | Standard | Chunks Mode |
---|---|---|
Serialize: 0.21 ms | Serialize: 1.19 ms | |
Compress: 1.30 ms | Compress: 10.59 ms | |
Total: 1.51 ms | Total: 11.78 ms |
Medium Table: 163584 Bytes | Standard | Chunks Mode |
---|---|---|
Serialize: 10 ms | Serialize: 12 ms | |
Compress: 579 ms | Compress: 674 ms | |
Total: 589 ms | Total: 686 ms |
Large Table: 1750073 Bytes | Standard | Chunks Mode |
---|---|---|
Serialize: 560 ms | Serialize: 609 ms | |
Compress: 7578 ms | Compress: 8323 ms | |
Total: 8128 ms | Total: 8932 ms |
X-Large Table: 3511073 Bytes | Standard | Chunks Mode |
---|---|---|
Serialize: 1130 ms | Serialize: 1223 ms | |
Compress: 16489 ms | Compress: 16700 ms | |
Total: 17589 ms | Total: 17923 ms |
Small table: 115 Bytes | Standard | Chunks Mode |
---|---|---|
Decompress: 0.24 ms | Decompress: 9 ms | |
Deserialize: 0.10 ms | Deserialize: 11 ms | |
Total: 0.34 ms | Total: 20 ms |
Medium Table: 163584 Bytes | Standard | Chunks Mode |
---|---|---|
Decompress: 192 ms | Decompress: 264 ms | |
Deserialize: 16 ms | Deserialize: 27 ms | |
Total: 208 ms | Total: 291 ms |
Large Table: 1750073 Bytes | Standard | Chunks Mode |
---|---|---|
Decompress: 3094 ms | Decompress: 3668 ms | |
Deserialize: 574 ms | Deserialize: 772 ms | |
Total: 3668 ms | Total: 4440 ms |
X-Large Table: 3511073 Bytes | Standard | Chunks Mode |
---|---|---|
Decompress: 6308 ms | Decompress: 7465 ms | |
Deserialize: 1074 ms | Deserialize: 1536 ms | |
Total: 7382 ms | Total: 9001 ms |
Chunks Mode does have a small increase in processing time but has the benefit (in WoW and likely other environments) of not locking up the game and making the OS think it's non-responsive when processing large data.
Similar changes made on my LibSerialize PR. https://github.com/rossnichols/LibSerialize/pull/4
Thanks. I get vocation on May 1-5. Will merge your PR during that period
I would keep the chunk_size option, but remove chunk_time option. It's better for the user to measure time outside of this library
I would keep the chunk_size option, but remove chunk_time option. It's better for the user to measure time outside of this library
I had asked him to add it for LibSerialize - conceptually, yielding by size/count doesn't achieve the goal of the feature, which is to minimize impact on frame rate. With size/count, a given operation will always be divided into the same number of chunks, whereas a time based approach allows for fluctuation based on CPU performance.
Maybe they can be combined into a single option though? The user supplies a function "shouldYield(count)" and can choose to implement it as "count > threshold" or "elapsed time > threshold". The library resets the count and yields when the function returns true.
I do not want to measure time in this library, which is not stable.
I don't think the library needs to measure the time. User can do this
The example in README of this PR
-- In WoW, coroutine processing can be handled on frame updates to reduce loss of framerate.
local processing = CreateFrame('Frame')
local WoW_compress_co = LibDeflate:CompressDeflate(example_input, {level=9, chunksMode=true})
processing:SetScript('OnUpdate', function()
local ongoing, WoW_compressed = WoW_compress_co()
if not ongoing then
-- do something with `WoW_compressed`
processing:SetScript('OnUpdate', nil)
end
end)
If time is a concern, change the code to
local processing = CreateFrame('Frame')
local WoW_compress_co = LibDeflate:CompressDeflate(example_input, {level=9, chunksMode=true})
time_threshold=1.0/120
processing:SetScript('OnUpdate', function()
local ongoing, WoW_compressed
start = os.clock()
while (not ongoing) and (os.clock() - start < time_threshold) do
ongoing, WoW_compressed = WoW_compress_co()
end
if not ongoing then
-- do something with `WoW_compressed`
processing:SetScript('OnUpdate', nil)
end
end)
time-based chunk will only be provided if it is much better than the alternative of size-based chunk in a while loop, as I show above
Will decide whether to add time option after I have completed code and done a benchmark
A handler function would be better than a time option
configs = { chunk_size -- integer chunk_yield_handler --function to control whether to yield. nil to use the default. chunk_yield_handler_userdata --user provided data to handler }
The prototype of chunk_yield_handler is:
function handler(userdata, libdata)
-- userdata is user provided data in configs.chunk_yield_handler_userdata
-- libdata is a table. Currently only contains one field: num_bytes_processed
-- return true to yield
-- return false to not yield
end
The default handler return true for every chunk_size
I can see that coroutine yield may be costly, so I will consider the handler approach.
I wasn't suggesting that the library itself measure time - that's why I said that the user-supplied handler is the entity that measures elapsed time. Your example code above doesn't actually help time measurement, because it's WoW_compress_co()
that needs its execution to be time-gated, not the code that calls it (unless you just specify a super small chunk size and yield much more often, returning control to the outer handler, but that'll increase the coroutine overhead).
I agree that a handler-based approach seems the most flexible and allows the library code to be generic. I don't necessarily see a value to returning nil
to use the "default" - rather, just have the library provide a default handler that yields e.g. based on the number of bytes. Then you can specify config to either:
Recently very busy and no time to work on this. Hopefully I can work on this on this weekend.
Currently not maintaining. Will pick up if I would like to
This adds an
async=true
config option tocoroutine.yield()
in the compress and decompress functions. This removes or reduces framerate loss while processing large strings.