Blosc / c-blosc

A blocking, shuffling and loss-less compression library that can be faster than `memcpy()`.
https://www.blosc.org
Other
983 stars 158 forks source link

Compiling with WebAssembly #238

Open jakirkham opened 6 years ago

jakirkham commented 6 years ago

Would be very interesting to have access to Blosc in the browser. It's looking like WebAssembly would be the best way to do that using something like Emscripten with LLVM or Binaryen to compile C to WebAssembly. Though there is probably more to this that I haven't thought of yet.

wolfv commented 5 years ago

I am very interested in this as I would like to test blosc in the jupyter-ros widgets as a means to compress point cloud data before sending it to the browser, and then decompress it in the browser. has there been any effort in setting up emscripten for blosc since this issue was opened?

wolfv commented 5 years ago

As you can see in the screenshot i managed to compile c-blosc to webassembly and get the simple example to run! :)

Screenshot from 2019-07-22 12-20-25

What I did is:

add_executable(simple simple.c)

include_directories(../blosc)
set(CMAKE_EXECUTABLE_SUFFIX ".wasm.js")
target_link_libraries(simple blosc_static)
set_target_properties(simple PROPERTIES LINK_FLAGS "-s WASM=1 -s USE_PTHREADS=1 -s TOTAL_MEMORY=167772160 -s BINARYEN_METHOD='native-wasm' -s EXPORTED_FUNCTIONS='[_main]'")

And a HTML file that contains this:

<!DOCTYPE html>
<html>
    <head></head>
    <body>
        <script src="simple.wasm.js?2"></script>
    </body>
</html>

Run a python simple server to serve the JS and HTML:

python3 -m http.server
FrancescAlted commented 5 years ago

Hey Wolf, that's pretty cool! Although it is unfortunate that WASM does not have support for SIMD instructions yet hopefully this would allow for better adoption of Blosc in the cloud. If you can point to an application that you are working on, please share it, as I may want to mention it in my forthcoming talks (starting by EuroSciPy 2019).

wolfv commented 5 years ago

Hi Francesc! I am evaluating blosc for compressing point clouds as part of the jupyter-ros effort here (https://github.com/RoboStack/jupyter-ros). We need fast on-the-fly compression of point clouds (RGBD data).

I was talking to @esc at scipy conf in austin about Blosc and he thought it might be a good idea...

If this actually proves useful, I think it might be interesting to push blosc as a compression mechanism for binary buffers as part of jupyter widgets.

Do you think that makes sense?

Maybe we'll have the chance to meet at EuroSciPy btw.

FrancescAlted commented 5 years ago

Sure, it makes total sense. I was just pointing out that the lack of SSE2/AVX2 support in WASM is going to hurt performance, but probably this is not really important when you are trying to download data from the network, where the bottleneck is the bandwidth, not decompression time.

OTOH, I am curious how is that you need to add the gzip headers; in theory, all the zlib headers comes with blosc itself just to avoid this.

Finally, yes, if it happens that you are going to be in EuroSciPy, it will be cool to meet; there will be a sprint on Caterva/Blosc2 on September 6th, and you are invited to come. Also, I have seen that you are based on Zurich, and it happens that I'm going to be there this week. Feel free to send me a message offline if you want to meet in Zurich too.

wolfv commented 5 years ago

we should definitely perform some benchmarks on this though :) Yes, everything appears to be included. I needed to add some headers to the included zlib headers in order to correctly include some other headers like unistd.h etc. (seems like platform detection with emscripten doesn't work correctly right now).

I am not going to make it to Zurich in time (currently travelling to berlin) but we'll probably meet at EuroSciPy! Cheers!

esc commented 5 years ago

Haha, I am in Berlin, let me know if you want to come by the Anaconda office for coffee. 😄

wolfv commented 5 years ago

yeah, will do for sure! do you have a recommendation for a spontaneous coworking space? @jtpio and me are looking for a table for today and tomorrow right now. Also I was planning to come tonight to the python meetup, i think you're signed up as well...

esc commented 5 years ago

@wolfv you could try: https://x-hain.de/de/ - I won't make it to the meetup tonight though.

manzt commented 4 years ago

Hi folks thanks to all the suggestions here, I was able to port a blosc codec for use in zarr.js. We still can't benefit from SIMD, but at least there is a npm module for blosc now via numcodecs.js. If a more flexible use of blosc is necessary, take a look at the code in codecs/blosc/blosc_codec.cpp and example.html.

$ npm install numcodecs
import { Blosc } from 'numcodecs';
jakirkham commented 4 years ago

Wow! That's fantastic! Thanks @manzt 😄