dropbox / lepton

Lepton is a tool and file format for losslessly compressing JPEGs by an average of 22%.
https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/
Apache License 2.0
5.01k stars 354 forks source link

Lepton as a shared library #35

Open onyxmaster opened 8 years ago

onyxmaster commented 8 years ago

I'd like to suggest supporting a shared library build with an simple ABI-neutral API. It would be very useful for creating wrappers (for example, I'd like to create a C# P/Invoke-based wrapper for .NET on Windows and a Go one to aid creating a simple http+jpeg<->GridFS+lepton proxy). Implementing ABI-neutral stream-based API might be challenging, since different systems/CRTs use different stream implementations (hence trying to provide external implementations of IOUtil::FileReader/IOUtil::FileWriter might quickly become to complex), so maybe the first version should only support byte* input and byte* output (along with a function to free the result).

So, the simplest prototype seems to be this one:

int32_t process_data(const uint8_t* input_data, uint32_t input_length, const uint8_t** output_data, uint32_t* output_length);

The return code can used as a success (0) or failure (non-zero) indicator for now.

I might take a look at it myself and maybe even submit a PR, but would like to discuss this with authors -- you probably had some thoughts going in this direction and might already have a few brighter ideas.

danielrh commented 8 years ago

I really would like this idea! There are some caveats: right now the system uses a few global variables in places, but it does have a fork/exec() (or CreateProcess on windows) system used for verification (to make sure state leaks never can cause success to be reported if failure--and for jails) so the approach could be to make the very minimalistic verify() function do all the library stuff...and the global variables are self-contained in those one-shot processes. This will make it easier to jail windows as well.

Does that sound like a plan? It's basically going to be a small wrapper around the read/write situation.

I would recommend starting with a wrapper around compress, since that uses the verification framework already...then afterwards it should be fairly easy to go after a decompression wrapper too.

One thing that is a bit odd is that we override the new() syscall and demand a massive upfront allocation... this may mean poking allocators through everywhere... so that we don't need to override new globally (though maybe that kind of thing stays within a C++ library and doesn't escape it)

onyxmaster commented 8 years ago

Thanks for the heads-up on the overloaded operator new(), this might indeed pose a problem when the shared library is loaded into another process that uses the same shared libstdc++/msvcprt. I'm not sure how different systems handle operator overloads and if there are any inconsistencies in different CRTs implementing this feature. Also, I believe that the USE_STANDARD_MEMORY_ALLOCATORS can be turned on for shared library targets and debugged/tested from there (making memory allocation issues a host application problem).

onyxmaster commented 8 years ago

As for the validation (separate process isolation if I understand the code correctly), the Windows case will probably be a bit slow (since CreateProcess is designed that way and fork is not available [except for WSL, but that's not a general case]). But, I do not care that much about perf on Windows (most of the time will still be spent reading/writing/encoding/decoding).

The seccomp should be turned off as well for the library itself in this case I guess, leaving it enabled only for the forked process.

danielrh commented 8 years ago

yes that sounds good...

it should be possible to simply specify the Sirikata::JpegAllocator() as a template parameter to all the C++ containers being used. Each usage of malloc is already tagged that way (as custom_malloc) --so it just is a matter of tracking down each std::vector and poking through the JpegAllocator https://github.com/dropbox/lepton/blob/a34ee2f4b0a6454eff8ebe334750dd008d57de35/src/io/Allocator.hh since JpegAllocator calls custom_malloc directly and custom_malloc/custom_free are compatible with seccomp

Ya perf on windows is secondary at the moment--the convenience library would be hugely valuable to the community

danielrh commented 8 years ago

SECCOMP is only turned on for the subprocess, since the main process never examines the hostile user data--so that won't need to change. and I'd recommend starting with a lib that only needs to encode... since that is already separated in multiprocess once the encode is in place it should be pretty easy to wrap decode in a similar manner.

danielrh commented 8 years ago

Any luck with this?

onyxmaster commented 8 years ago

Oh, not yet, Daniel, have lots of things on my plate right now.

dlemstra commented 8 years ago

Have you made any progress with this @onyxmaster? I tried to create a proof of concept C# wrapper yesterday night but I was unable to get it working by hacking into the code of the jpgcoder. I can also give it a try @danielrh but I would need some more pointers on where/how to start.

Bulat-Ziganshin commented 6 years ago

I'm going to finally implement that: https://encode.ru/threads/2552-Lepton-image-compression?p=58615&viewfull=1#post58615