Goals/non-goals clarification

I'm the author of imageflow. It's a (much older) project with similar goals (secure, correct, fast in that order), and in an ideal world we would find ways to share code and effort and collaboration. With the prevalence of gain maps, avif, jxl, and HDR, I'm looking at a rewrite soon.

In that line, I'm wondering about what might be goals and non-goals for this project. Here are some criteria I had for Imageflow

Robust and safe for use on web-servers, in process (1a: Never panic, use fallible alloc for buffers) (1b: Bounded runtime, to prevent DDOS), (1c: Optimize for lots of concurrent jobs, minimize memory consumption).
Be correct: especially about color profiles, linear-light image resampling, image scaling weight calculation, border pixels, minimize jpeg decoding/encoding artifacts, etc. Bitwise accuracy isn't needed in lossy formats, but there should be a bounded percent of off-by-one-bit errors compared to know truth when possible.
Optimize image compression for web use (1a: Hug the pareto frontier on size/quality tradeoffs, mozjpeg+custom, pngquant, etc) (1b: Support WebP, eventually AVIF/JXL) (1c: Make image optimization so fast it can happen in real time, on demand, in response to HTTP GET requests) (1d: Output sRGB images that look the same everywhere and Just Work)
Allow for end-to-end optimizations that aren't possible with an imperative API (for example, if you know you're going to downscale an image 1/3.6th, then you can opt for a faster IDCT kernel in the decoder. Or avoid transposes, or wasted compositions, etc. Imageflow has a graph API (which is what the execution engine uses), and a querystring API that invokes it, which is what everyone uses 99% of the time.
Debuggability. Imageflow predates many useful Rust debugging features, so there are clear ways to do this, but it currently builds an error object with an embedded stacktrace, regardless of debuginfo=false, using a macro. Thus, regardless of the deployment, I can know what line a codec failed at in production. Since all errors are recoverable (the image job might fail, but the process won't), we can see what went wrong and provide helpful error messages.
Sustainability. With Imageflow (2014-present) (and ImageResizer before it in 2008), I wanted something people could embed in their products and solutions and rely on for a long time, so I created a business model to support that. Open-source devs can use it all for free, and businesses can just pay a fee. What's the long-term plan? Maintaining an image library tends to turn into a lifelong thing, if you look at imageflow, libvips, imagesharp, etc.. It... can be hard to get contributors with enough domain knowledge and drive.
Robust ABI & bindings for multiple languages (including C#, Node, C) It turned out (for me) that the web server/serverless/caching/db/delivery side is more work (and less fun) than codecs, but also matters more to most deployments, so I have to spend a soul-crushing amount of time doing C# in Imageflow Server. Someday, Rust will get decent enough at the stuff I do there and I'll port it over, but it's not there yet. (I had to retire the Rust imageflow_server a few years ago).
Cross-platform, cross-architecture support. I have to support Windows, Linux, and macOS on x86_64 and aarch64, in that order, but also 32-bit x86 on Windows (for now, hopefully 32-bit dies soon). WASM is also a goal, but not yet something Imageflow targets in CI.

Knowing where our goals align would be pretty great; if there are areas that I can contribute Imageflow's unique advanages to zune-image and then merge a crate or two, it would let me write more useful features. Fast AI 4x upscaling and AI auto-enhancement, salience detection, etc, are all things I'd love to make happen, but I can't allocate that kind of effort while also juggling all the other aspects of my end-to-end solution.

Hi, apologies for the delay in response.

Robust and safe for use on web-servers, in process We align in priorities there, but we don't use failable allocations yet, but that can be added.
Be correct: especially about color profiles, linear-light image resampling, image scaling weight calculation, border pixels, minimize jpeg decoding/encoding artifacts, etc Priorities align, but most of them aren't included yet, I.e you can write a pipeline to do linear light image sampling by stitching the components together, the jpeg decoding artifacts is almost complete once I switch to a more accurate color upsampler.
Optimize image compression for web use Align: The problem is I haven't really written a complex compressor, (PNG maybe?, but it's still not yet ready), I have plans maybe in the future for a better jxl encoder, a webp one and maybe (far fetched) avif
1. Allow for end-to-end optimizations that aren't possible with an imperative API Such increase complexity so I tend to deal with them on a case to case basis.
2. Sustainability Maybe look into ways to get paid for maintenance, e.g something like https://www.sovereigntechfund.de/ or https://nlnet.nl/project/libvips/ , also github sponsors is another way, but this would be more viable if the library gains traction
3. Robust ABI & bindings for multiple languages (including C#, Node, C) The architecture of zune-image makes abi binding easy since filters are implemented as consumers of an image and not extenders , meaning Brigthen(new_value).execute(image)? instead of image.brighten(new_value) which means less breaks in the core api,
Cross-platform, cross-architecture support. I have to support Windows, Linux, and macOS on x86_64 and aarch64, in that order, but also 32-bit x86 on Windows (for now, hopefully 32-bit dies soon). WASM is also a goal, but not yet something Imageflow targets in CI.
- All mentioned architectures are supported, I'm leaning more to portable_simd instead of platform specific to reduce duplication, so a benefit is SIMD optimizations can work for all architectures

My reccomendation would be to depend on core image libraries and write the glue your own way,

There exist quite large differences to how zune-image does processing, e.g I chose to process images in planar (RRRR,GGG,BBB) instead of interleaved (R,G,B) to make operations easier, but this means we have to decode the whole image before processing, if end-to-end latency is of utmost priority an interleaved architecture may work better, feeding pixels into the pipeline as they are decoded.

Operations can be easily ported to support that, as they are written as functions that work on one channel and it makes it better since you can include only operations imageflow supports

It's great to hear how much overlap there is in our goals!

Imageflow resolves the operation graph to imperative instructions, so there's no need for underlying functions to be graph-based.

And it not like I can't use my own encoding logic.

A couple questions on performance:

Downscaling during IDCT is really useful when you can verify the signal loss is insignificant and are doing a later sufficiently larger factor downscale. I generated C code for a bunch of kernels and brute force tested them for DSSIM impact, then injected it into the jpeg decoder. 8x8->nxn SIMD kernels are really fast. Have you looked into doing anything like that?
Premultiplication of the alpha channel is essential prior to downsampling, or you can get extraordinary artifacts. Have you noticed a penalty for planar memory layout when premultiplying and reversing it? I haven't run benchmarks, and would love to know what the impact is when doing compositing/resampling in planar mode.
Imageflow current works on entire image frames, and doesn't support streaming or region-tiled operations. That said, it definitely limits the upper bound of image dimensions and is really problematic in constrained memory situations. I initially made this choise due to speed/cache benefits, predictability (don't want to stall out indefinitely for I/O reasons with big buffers in play), and how complex the ABI/FFI interface gets if you want to support async across language barriers. That said, libvips is proof that you can have both speed and low memory impact - something relevant for serverless function hosting limitations.

What I haven't tested is how broadly image encoders are affected by streaming vs whole image. Final image file size is king, and some optimizations need to review all the data first.

etemesi254 / zune-image

Goals/non-goals clarification #228