Open onyxmaster opened 8 years ago
I really would like this idea! There are some caveats: right now the system uses a few global variables in places, but it does have a fork/exec() (or CreateProcess on windows) system used for verification (to make sure state leaks never can cause success to be reported if failure--and for jails) so the approach could be to make the very minimalistic verify() function do all the library stuff...and the global variables are self-contained in those one-shot processes. This will make it easier to jail windows as well.
Does that sound like a plan? It's basically going to be a small wrapper around the read/write situation.
I would recommend starting with a wrapper around compress, since that uses the verification framework already...then afterwards it should be fairly easy to go after a decompression wrapper too.
One thing that is a bit odd is that we override the new() syscall and demand a massive upfront allocation... this may mean poking allocators through everywhere... so that we don't need to override new globally (though maybe that kind of thing stays within a C++ library and doesn't escape it)
Thanks for the heads-up on the overloaded operator new(), this might indeed pose a problem when the shared library is loaded into another process that uses the same shared libstdc++/msvcprt. I'm not sure how different systems handle operator overloads and if there are any inconsistencies in different CRTs implementing this feature. Also, I believe that the USE_STANDARD_MEMORY_ALLOCATORS can be turned on for shared library targets and debugged/tested from there (making memory allocation issues a host application problem).
As for the validation (separate process isolation if I understand the code correctly), the Windows case will probably be a bit slow (since CreateProcess is designed that way and fork is not available [except for WSL, but that's not a general case]). But, I do not care that much about perf on Windows (most of the time will still be spent reading/writing/encoding/decoding).
The seccomp should be turned off as well for the library itself in this case I guess, leaving it enabled only for the forked process.
yes that sounds good...
it should be possible to simply specify the Sirikata::JpegAllocator() as a template parameter to all the C++ containers being used. Each usage of malloc is already tagged that way (as custom_malloc) --so it just is a matter of tracking down each std::vector and poking through the JpegAllocator https://github.com/dropbox/lepton/blob/a34ee2f4b0a6454eff8ebe334750dd008d57de35/src/io/Allocator.hh since JpegAllocator calls custom_malloc directly and custom_malloc/custom_free are compatible with seccomp
Ya perf on windows is secondary at the moment--the convenience library would be hugely valuable to the community
SECCOMP is only turned on for the subprocess, since the main process never examines the hostile user data--so that won't need to change. and I'd recommend starting with a lib that only needs to encode... since that is already separated in multiprocess once the encode is in place it should be pretty easy to wrap decode in a similar manner.
Any luck with this?
Oh, not yet, Daniel, have lots of things on my plate right now.
Have you made any progress with this @onyxmaster? I tried to create a proof of concept C# wrapper yesterday night but I was unable to get it working by hacking into the code of the jpgcoder. I can also give it a try @danielrh but I would need some more pointers on where/how to start.
I'm going to finally implement that: https://encode.ru/threads/2552-Lepton-image-compression?p=58615&viewfull=1#post58615
I'd like to suggest supporting a shared library build with an simple ABI-neutral API. It would be very useful for creating wrappers (for example, I'd like to create a C# P/Invoke-based wrapper for .NET on Windows and a Go one to aid creating a simple http+jpeg<->GridFS+lepton proxy). Implementing ABI-neutral stream-based API might be challenging, since different systems/CRTs use different stream implementations (hence trying to provide external implementations of IOUtil::FileReader/IOUtil::FileWriter might quickly become to complex), so maybe the first version should only support byte* input and byte* output (along with a function to free the result).
So, the simplest prototype seems to be this one:
The return code can used as a success (0) or failure (non-zero) indicator for now.
I might take a look at it myself and maybe even submit a PR, but would like to discuss this with authors -- you probably had some thoughts going in this direction and might already have a few brighter ideas.