Open andriy-gerasika opened 6 years ago
Thanks for the suggestion. I haven't looked at that implementation yet, but I think it makes sense. We would just have to add a small hunter package for the iostream_bz stuff. We had previously used boost iostreams (boost + iostreams was taking 400K itself) before moving to the lightweight cereal stuff, and I had a TODO to identify a similar C++ style lightweight replacement for iostreams (or write one) in #534 (and a few other notes). The one you mentioned looks like it could be a good match. The current models use a half precision floating point type for a relatively easy factor of 2 reduction, but most compression schemes struggle with floating point. I think moving to a fixed point representation would also improve the final compression rate significantly.
I think there is a lot of potential for further reduction through custom compression schemes and maybe some kind of sparse coding. I made a quick note about it here https://github.com/elucideye/drishti/issues/203#issue-194384218. The current PCA scheme is more or less looking to minimize the "dictionary" size, but I think we can probably get smaller models with a larger "dictionary" that supports more compact model representations. If it worked well, that might be a good general purpose addition for dlib's shape_predictor.
I found this post to be an interesting read on the somewhat related problem of time series compression. For the fixed depth multi-variate gradient boosting trees, we could do something like this:
I ran a quick hacky POC using lossy audio codecs with a scheme like this to see what the compression rates would yield (not necessarily well suited to the problem, but easy to try)
PCA regression model size w/ compression
INPUT 4.0M /tmp/points_pca.cpb
436K /tmp/pca.mp4
644K /tmp/pca.ogg
1.1M /tmp/pca.flac
2.0M /tmp/pca.wav
points = load('points_pca.txt');
points2 = ((points-min(points(:)))/(max(points(:))-min(points(:))))*2.0-1.0;
audiowrite('pca.mp4', points2, 44100);
audiowrite('pca.ogg', points2, 44100);
audiowrite('pca.flac', points2, 44100);
audiowrite('pca.wav', points2, 44100);
I didn't actually test the impact on the models.
A good read on floating point time series compression: http://blog.omega-prime.co.uk/?p=184
[I found this to be surprising]
One general observation is that delta encoding is very rarely the best choice, and when it is the best, the gains are usually marginal when compared to literal encoding. This is interesting because Fabian Giesen came to exactly the same conclusion (that delta encoding is redundant when you can do transposition) in the excellent presentation that I linked to earlier.
I guess tree pruning could be another huge win, but that might interfere with compression in the above schemes due to compression.
In any event, bzip2 should be an easy place to start. Are you planning to work on it?
Sorry, no -- I am still struggling to make my app working fast on mobile platforms. (iostream_bz.h& iostream_bz.cpp is working code from my project, so it should work ok as it is)
re half float/float16 compression: have a look at https://www.intelnervana.com/flexpoint-numerical-innovation-underlying-intel-nervana-neural-network-processor/ -- basically Intel "invented" float16 w/ fixed exponent, i.e. if all drishti floating points use roughly the same exponent, it is ok to drop exponent bits and use more bits for mantissa. This won't decrease size of resources, but will increase precision of floating points at the expense of code complexity :)
I added an git + cmake mirror of the codeproject.com sources from here: https://www.codeproject.com/Articles/4457/zipstream-bzip-stream-iostream-wrappers-for-the-zl and added explicit GTest assertions for the provided test file.
There seems to be an issue with very small differences using double types (at least on OS X + xcode 8.3.3). I haven't tried it with bzip2 directly or attempted to debug it in general. It could be something simple or a configuration/build problem. Have you encountered any problems with it? Note that the default GTest print precision isn't sufficient to show the difference.
git clone https://github.com/headupinclouds/bzip2stream
cd bzip2stream
polly.py --toolchain xcode --verbose --config Release --install --fwd BZIP2STREAM_BUILD_TEST=ON --test
[/private/tmp/bzip2stream/_builds/xcode]> "ctest" "-C" "Release" "-VV"
UpdateCTestConfiguration from :/tmp/bzip2stream/_builds/xcode/DartConfiguration.tcl
UpdateCTestConfiguration from :/tmp/bzip2stream/_builds/xcode/DartConfiguration.tcl
Test project /tmp/bzip2stream/_builds/xcode
Constructing a list of tests
Done constructing a list of tests
Updating test list for fixtures
Added 0 tests to meet fixture requirements
Checking test dependency graph...
Checking test dependency graph end
test 1
Start 1: bzip2stream-test
1: Test command: /tmp/bzip2stream/_builds/xcode/test/Release/bzip2stream-test
1: Test timeout computed to be: 9.99988e+06
1: Running main() from gtest_main.cc
1: [==========] Running 5 tests from 1 test case.
1: [----------] Global test environment set-up.
1: [----------] 5 tests from bzip2stream
1: [ RUN ] bzip2stream.test_buffer_to_buffer
1: [ OK ] bzip2stream.test_buffer_to_buffer (0 ms)
1: [ RUN ] bzip2stream.test_wbuffer_to_wbuffer
1: [ OK ] bzip2stream.test_wbuffer_to_wbuffer (0 ms)
1: [ RUN ] bzip2stream.test_string_string
1: /tmp/bzip2stream/test/bzip2_stream_test.cpp:162: Failure
1: Expected equality of these values:
1: d
1: Which is: 3.14159
1: d_r
1: Which is: 3.14159
1: [ FAILED ] bzip2stream.test_string_string (0 ms)
1: [ RUN ] bzip2stream.test_wstring_wstring
1: /tmp/bzip2stream/test/bzip2_stream_test.cpp:208: Failure
1: Expected equality of these values:
1: d
1: Which is: 3.14159
1: d_r
1: Which is: 3.14159
1: [ FAILED ] bzip2stream.test_wstring_wstring (1 ms)
1: [ RUN ] bzip2stream.test_file_file
1: /tmp/bzip2stream/test/bzip2_stream_test.cpp:263: Failure
1: Expected equality of these values:
1: d
1: Which is: 3.14159
1: d_r
1: Which is: 3.14159
1: [ FAILED ] bzip2stream.test_file_file (0 ms)
1: [----------] 5 tests from bzip2stream (1 ms total)
1:
1: [----------] Global test environment tear-down
1: [==========] 5 tests from 1 test case ran. (1 ms total)
1: [ PASSED ] 2 tests.
1: [ FAILED ] 3 tests, listed below:
1: [ FAILED ] bzip2stream.test_string_string
1: [ FAILED ] bzip2stream.test_wstring_wstring
1: [ FAILED ] bzip2stream.test_file_file
1:
1: 3 FAILED TESTS
1/1 Test #1: bzip2stream-test .................***Failed 0.01 sec
@andriy-gerasika : I'm seeing some internal test failures for double types in bzip2stream . Any ideas?
Hello, what do you think about compressing resources with gz/bzip2 and deserializing these using iostream/(gz|bzip2) wrapper:
I mean, on main page of drishti it is said that goal of the project is "SDK size <= 1 MB and combined resources (object detection + regression models) <= 4 MB" -- if resources are compressed using gz/bzip2 with compression level 9 this would help to ease requirement on resource size (albeit increase code size) and allow to use more complex models (until compressed models will exceed 4MB)
repo for iostream/gz wrapper: https://github.com/geromueller/zstream-cpp iostream/bzip2 wrapper: http://www.gerixsoft.com/tmp/iostream_bz.h http://www.gerixsoft.com/tmp/iostream_bz.cpp
I would recommend using bzip2, since it compresses better (of course bzip2 decompression times are bigger compare to gzip/zlib, but since resources are loaded in threads, I guess this does not matter much), plus there is already a hunter package for bzip2... p.s. I am already using this approach in my project (.cpbz files), so this is a suggestion for general use