cloudflare / jpegtran

jpegtran fork with significant performance improvements
133 stars 16 forks source link

Question: could this merge with libjpeg-turbo and/or friends? #3

Open lovell opened 8 years ago

lovell commented 8 years ago

Impressive results, well done Vlad.

Am I correct in thinking this improvement could apply not only to the jpegtran command line tool but also any compression made via the libjpeg shared library?

If so, are you (or Cloudflare) in a position to be able to contribute the benefits back to one of the libjpeg forks?

libjpeg-turbo (and the mozjpeg fork of it) already uses SIMD on the decompression side so would make a suitable candidate for SIMD on the compression side also.

vkrasnov commented 8 years ago

You are correct. In fact I tested the code with turbo as well. Will they accept intrinsics code? I didn't see any.

denji commented 8 years ago
vkrasnov commented 8 years ago

@denji I am not sure how your post relates here.

lovell commented 8 years ago

@vkrasnov I believe the use of the more recent AVX2 intrinsics were trialled in libjpeg-turbo as part of https://github.com/libjpeg-turbo/libjpeg-turbo/issues/2

The adoption of SSE4.2 is greater than AVX2 so I think it's worth us asking, if you're up for the challenge :)

kornelski commented 8 years ago

:+1: It'd be great to have this improvement in libjpeg-turbo, or at least a libjpeg-turbo compatible patch that could be used in libjpeg-turbo forks.

In terms of compression, the best version of jpegtran currently is in MozJPEG, which is based on libjpeg-turbo. So this patch in libjpeg-turbo would enable a version of jpegtran that is both faster and generates smallest files.

kornelski commented 8 years ago

The vulnerabilities #4 and #5 are also fixed in libjpeg-turbo.

lovell commented 8 years ago

https://github.com/libjpeg-turbo/libjpeg-turbo/pull/25 - thanks @vkrasnov!

dcommander commented 8 years ago

Hi, guys. I can confirm that this speeds up progressive encoding by almost 2X. This technology will eventually make it into libjpeg-turbo, but I'm trying to secure research funding in order to enhance it. SSE 4.2 and especially AVX2 intrinsics are tricky, because they require newer compilers than a lot of "active" O/S distributions have. Also, as currently implemented, the choice of whether to use SSE 4.2 or AVX2 has to be made at compile time. What I want to do is integrate this with the existing libjpeg-turbo SIMD framework, thus allowing run-time detection and execution of the best available Huffman SIMD kernel for a given architecture. This will involve porting the various flavors of the code to NASM assembly (easy) and introducing a run-time selection mechanism. I also want to investigate whether a pure SSE2 version can be developed-- perhaps without as much speedup but still probably significant enough to make it worthwhile. I also want to investigate whether these same techniques could be applied to baseline encoding.