ImageOptim / libimagequant

Palette quantization library that powers pngquant and other PNG optimizers
https://pngquant.org/lib
Other
780 stars 132 forks source link

Flexibilize QuantizationResult API to allow different memory sources #47

Closed AlexTMjugador closed 3 years ago

AlexTMjugador commented 3 years ago

Currently, under the Rust bindings, using the remapped or palette methods allocate Vecs to hold the desired data. However, in certain use cases, these extra allocations are unnecessary and may introduce undesirable overhead.

Fix that situation by allowing the caller to provide buffers that will be populated by this crate, and allowing getting slices of palette data, in order to allow avoiding most allocations involved.

AlexTMjugador commented 3 years ago

I've been testing the changes and tried my best to keep the new methods safe and sound. I've also updated the tests so they use the new methods, and they pass. Please let me know if I still should change something. Thanks!

kornelski commented 3 years ago

Thanks for this PR.

I would rather not expose PalettePtr as-is. For the borrowed palette, why not return a slice?

Lifetimes on structs, like QuantizationResult<'a>, mean the struct have borrowed data that is outside of them, and that data outlives it (has existed before this struct, and may exist after). That is not true, the palette is in the result, so this lifetime is incorrect.

kornelski commented 3 years ago

For writing to a buffer, AsRef<[u8]> is problematic, as it requires the caller to initialize the memory. It's UB not to.

How about accepting &mut Vec<u8>? Do you need to support other containers?

AlexTMjugador commented 3 years ago

For writing to a buffer, AsRef<[u8]> is problematic, as it requires the caller to initialize the memory. It's UB not to.

How about accepting &mut Vec<u8>? Do you need to support other containers?

I think it'd be desirable to support other containers, yes. I'm writing a Tokio decoder that works with a reference to a BytesMut struct and this change makes it possible to use that buffer with this library. The std::io::Read trait accepts a &mut [u8] and warns that calling read with an uninitialized buffer may lead to undefined behavior, so I guess it seems reasonable to ask client code to initialize the buffer, and document the pitfalls of not doing so. Anyway, it's probably a better idea to avoid the generic parameter altogether and just accept a &mut [u8], so I'll change that.

Lifetimes on structs, like QuantizationResult<'a>, mean the struct have borrowed data that is outside of them, and that data outlives it (has existed before this struct, and may exist after). That is not true, the palette is in the result, so this lifetime is incorrect.

QuantizationResult now has a PhantomData field whose purpose is to tell the Rust compiler to act as if the struct contained a reference to a liq_palette. From my point of view, this makes sense and is a good model of the relationship between the result and the palette. On the other hand, if that reference must be valid for some lifetime, called 'a in the code, then the borrow checker knows that no reference to the struct can outlive 'a, so the struct at least lives as long as 'a, which I think is what we want. I also think that methods like palette_ptr return a reference to a liq_palette with a lifetime bounded by this constraint, so it's fine there. I think you had in mind the arguably more common case that QuantizationResult accepted a pointer to a liq_palette from the user, as in that case 'a would indeed be at least as long as the lifetime of the struct, but the Rust compiler is smart enough to handle variance, contravariance and so on.

In any case, if what I stated is wrong (by no means I'm a Rust expert), I'd be happy to see some code snippet that exposes a problem and fix it.

I would rather not expose PalettePtr as-is. For the borrowed palette, why not return a slice?

That's a good idea, I'll try to apply it. Thanks!

kornelski commented 3 years ago

Struct<'a> is for specifying a lifetime that is different from lifetime of the struct itself. It's only for data that is borrowed from outside of the struct. PhantomData is just an implementation detail of this, for cases where something, like FFI, prevents from expressing this relationship directly.

In this case the result owns the palette. The palette is not borrowed by the result object. Therefore PhantomData and the lifetime that say the palette is borrowed by the result are just incorrect. They don't properly describe what is actually happening on the C side.

This code is harmless and happens to work, but only because you have a second mistake that masks the first. The '_ lifetime syntax uses elision, which makes the compiler choose the defaults. The '_ on the impl is different than '_ on the PalettePtr<'_>. The defaults for lifetimes happen to be correct here, because lifetime on the PalettePtr is taken from &self borrow, so it's the same as a reference to the result itself, not some external reference borrowed by the result.

The "not-mine" lifetime on the QuantizationResult struct happens to be taken from &mut self of .quantize() call, which is not correct (Attributes/Histogram objects don't contain the palette, and the result is independent of them). It's not unsafe, because it happens to be too restrictive. But it's incorrect.

AlexTMjugador commented 3 years ago

Thank you, that was insightful.

The "not-mine" lifetime on the QuantizationResult struct happens to be taken from &mut self of .quantize() call, which is not correct (Attributes/Histogram objects don't contain the palette, and the result is independent of them). It's not unsafe, because it happens to be too restrictive. But it's incorrect.

So that explains why I had to move an Attributes variable out of the scope it was before so the code compiled.

Still, I think it'd be great if the code is both safe and correct. Following what I understood of your comment and the examples of the Nomicon, would the correct solution involve a new *const liq_palette field, with a PhantomData<liq_palette> marker field?

Edit: even though this was already merged, it's definitely not finished, for the reasons stated above. What's your plan on addressing these concerns? Should I open a new PR?

kornelski commented 3 years ago

Thanks for the PR. I've merged it, and fixed the lifetime issues myself.

I've exposed the buffer as [MaybeUninit<u8>], which allows use even with uninitialized data. The caller may need to do some unsafe dance to cast slice pointers, but OTOH it's the most efficient primitive.

kornelski commented 3 years ago

So the thing is, you don't need any PhantomData at all. A function like:

fn palette(&self) -> &[Color]

means:

fn palette<'a>(&'a self) -> &'a [Color]

which makes lifetime of the slice limited to lifetime of the borrow of self. So self doesn't have to contain any phantomdata, or reference, or anything like that. The caller borrows self, and then everything returned is related to that borrow by default. The borrow checker checks against interfaces, so unsafe code inside can just cast pointers to make it happen.

PhantomData<liq_palette> is unnecessary, because it wouldn't really affect compiler's reasoning. It's generally for PhantomData<T> where T is generic, and the caller could put in there references, thread-unsafe types and other things that alter behavior of the struct as a whole. Just a bunch of palette bytes doesn't add new restrictions.

You'd use

impl Result<'x> {
   fn palette<'a>(&'a self) -> &'x [Color]
}

only if 'x is unrelated to self, and it'd be valid to keep using [Color] slice after drop(result) (for example, Iterator in Rust works like that).