RazrFalcon / tiny-skia

A tiny Skia subset ported to Rust
BSD 3-Clause "New" or "Revised" License
1.05k stars 67 forks source link

Change internal representation of ColorU8 from u32 to [u8; 4] #86

Closed e00E closed 1 year ago

e00E commented 1 year ago

This representation is more natural and leads to simpler code. Also fixes alignment issues when casting Pixmap::data into PremultipliedColorU8 which used to have alignment requirement 4 due to the u32 but now has alignment requirement 1.

I removed the get functions because I see no reason to make the internal representation part of the public api. The only guarantee we need to make is that bytemuck casting is in RGBA. Whether that comes from u32 or [u8; 4] is irrelevant.

I've collected the following benchmarks on my machine (linux, x86-64, Amd Ryzen 7 3700x): master

``` test blend::clear_tiny_skia ... bench: 47,395 ns/iter (+/- 658) test blend::color_burn_tiny_skia ... bench: 841,829 ns/iter (+/- 10,009) test blend::color_dodge_tiny_skia ... bench: 793,163 ns/iter (+/- 4,360) test blend::color_tiny_skia ... bench: 1,278,893 ns/iter (+/- 18,811) test blend::darken_tiny_skia ... bench: 774,695 ns/iter (+/- 102,116) test blend::destination_atop_tiny_skia ... bench: 745,111 ns/iter (+/- 8,978) test blend::destination_in_tiny_skia ... bench: 706,915 ns/iter (+/- 36,712) test blend::destination_out_tiny_skia ... bench: 705,427 ns/iter (+/- 92,086) test blend::destination_over_tiny_skia ... bench: 767,649 ns/iter (+/- 6,412) test blend::destination_tiny_skia ... bench: 10 ns/iter (+/- 1) test blend::difference_tiny_skia ... bench: 777,142 ns/iter (+/- 4,641) test blend::exclusion_tiny_skia ... bench: 762,880 ns/iter (+/- 248,448) test blend::hard_light_tiny_skia ... bench: 814,048 ns/iter (+/- 75,306) test blend::hue_tiny_skia ... bench: 1,566,292 ns/iter (+/- 38,887) test blend::lighten_tiny_skia ... bench: 763,686 ns/iter (+/- 13,570) test blend::luminosity_tiny_skia ... bench: 1,225,186 ns/iter (+/- 17,558) test blend::modulate_tiny_skia ... bench: 702,411 ns/iter (+/- 6,456) test blend::multiply_tiny_skia ... bench: 767,768 ns/iter (+/- 10,769) test blend::overlay_tiny_skia ... bench: 812,880 ns/iter (+/- 8,114) test blend::plus_tiny_skia ... bench: 690,784 ns/iter (+/- 5,756) test blend::saturation_tiny_skia ... bench: 1,521,044 ns/iter (+/- 9,729) test blend::screen_tiny_skia ... bench: 714,920 ns/iter (+/- 12,635) test blend::soft_light_tiny_skia ... bench: 1,136,971 ns/iter (+/- 23,689) test blend::source_atop_tiny_skia ... bench: 764,121 ns/iter (+/- 11,158) test blend::source_in_tiny_skia ... bench: 743,836 ns/iter (+/- 4,109) test blend::source_out_tiny_skia ... bench: 746,962 ns/iter (+/- 6,216) test blend::source_over_tiny_skia ... bench: 639,697 ns/iter (+/- 8,237) test blend::source_tiny_skia ... bench: 46,318 ns/iter (+/- 625) test blend::xor_tiny_skia ... bench: 758,935 ns/iter (+/- 5,307) test clip::aa_tiny_skia ... bench: 2,401,207 ns/iter (+/- 15,759) test clip::tiny_skia ... bench: 2,227,737 ns/iter (+/- 14,952) test fill::all_tiny_skia ... bench: 63,006 ns/iter (+/- 298) test fill::opaque_tiny_skia ... bench: 44,203 ns/iter (+/- 885) test fill::path_aa_tiny_skia ... bench: 784,148 ns/iter (+/- 6,711) test fill::rect_aa_tiny_skia ... bench: 949,118 ns/iter (+/- 12,614) test fill::rect_aa_ts_tiny_skia ... bench: 457,502 ns/iter (+/- 3,864) test fill::rect_tiny_skia ... bench: 897,307 ns/iter (+/- 10,880) test fill::source_tiny_skia ... bench: 44,151 ns/iter (+/- 1,160) test gradients::simple_radial_tiny_skia ... bench: 2,779,578 ns/iter (+/- 18,908) test gradients::three_stops_linear_even_tiny_skia ... bench: 2,652,559 ns/iter (+/- 18,716) test gradients::three_stops_linear_even_tiny_skia_hq ... bench: 1,762,598 ns/iter (+/- 13,241) test gradients::three_stops_linear_uneven_tiny_skia ... bench: 2,647,164 ns/iter (+/- 25,342) test gradients::three_stops_linear_uneven_tiny_skia_hq ... bench: 1,762,510 ns/iter (+/- 23,290) test gradients::two_point_radial_tiny_skia ... bench: 2,155,603 ns/iter (+/- 28,844) test gradients::two_stops_linear_pad_tiny_skia ... bench: 1,967,302 ns/iter (+/- 10,452) test gradients::two_stops_linear_pad_tiny_skia_hq ... bench: 1,256,187 ns/iter (+/- 15,603) test gradients::two_stops_linear_reflect_tiny_skia ... bench: 2,129,474 ns/iter (+/- 13,214) test gradients::two_stops_linear_reflect_tiny_skia_hq ... bench: 1,583,874 ns/iter (+/- 21,021) test gradients::two_stops_linear_repeat_tiny_skia ... bench: 2,107,308 ns/iter (+/- 524,551) test gradients::two_stops_linear_repeat_tiny_skia_hq ... bench: 1,437,999 ns/iter (+/- 27,691) test hairline::aa_tiny_skia ... bench: 3,210,783 ns/iter (+/- 24,299) test hairline::tiny_skia ... bench: 1,441,731 ns/iter (+/- 14,204) test patterns::hq_tiny_skia ... bench: 12,979,780 ns/iter (+/- 37,808) test patterns::lq_tiny_skia ... bench: 4,251,167 ns/iter (+/- 19,757) test patterns::plain_tiny_skia ... bench: 1,855,304 ns/iter (+/- 13,727) test png_io::decode_raw_rgb ... bench: 49,201 ns/iter (+/- 597) test png_io::decode_raw_rgba ... bench: 64,804 ns/iter (+/- 662) test png_io::decode_rgb ... bench: 108,838 ns/iter (+/- 916) test png_io::decode_rgba ... bench: 85,209 ns/iter (+/- 625) test png_io::encode_raw_rgba ... bench: 236,760 ns/iter (+/- 1,476) test png_io::encode_rgba ... bench: 279,090 ns/iter (+/- 2,447) test spiral::tiny_skia ... bench: 1,777,500 ns/iter (+/- 13,612) ```

my branch

``` test blend::clear_tiny_skia ... bench: 48,129 ns/iter (+/- 443) test blend::color_burn_tiny_skia ... bench: 832,963 ns/iter (+/- 43,900) test blend::color_dodge_tiny_skia ... bench: 798,502 ns/iter (+/- 8,852) test blend::color_tiny_skia ... bench: 1,270,511 ns/iter (+/- 6,265) test blend::darken_tiny_skia ... bench: 756,790 ns/iter (+/- 11,823) test blend::destination_atop_tiny_skia ... bench: 722,887 ns/iter (+/- 8,798) test blend::destination_in_tiny_skia ... bench: 678,354 ns/iter (+/- 9,308) test blend::destination_out_tiny_skia ... bench: 678,561 ns/iter (+/- 9,349) test blend::destination_over_tiny_skia ... bench: 756,359 ns/iter (+/- 10,428) test blend::destination_tiny_skia ... bench: 10 ns/iter (+/- 1) test blend::difference_tiny_skia ... bench: 763,730 ns/iter (+/- 10,570) test blend::exclusion_tiny_skia ... bench: 696,425 ns/iter (+/- 6,309) test blend::hard_light_tiny_skia ... bench: 785,740 ns/iter (+/- 11,348) test blend::hue_tiny_skia ... bench: 1,543,312 ns/iter (+/- 14,419) test blend::lighten_tiny_skia ... bench: 757,481 ns/iter (+/- 12,097) test blend::luminosity_tiny_skia ... bench: 1,208,416 ns/iter (+/- 7,717) test blend::modulate_tiny_skia ... bench: 682,061 ns/iter (+/- 8,748) test blend::multiply_tiny_skia ... bench: 747,239 ns/iter (+/- 11,744) test blend::overlay_tiny_skia ... bench: 794,790 ns/iter (+/- 230,655) test blend::plus_tiny_skia ... bench: 673,664 ns/iter (+/- 7,683) test blend::saturation_tiny_skia ... bench: 1,509,173 ns/iter (+/- 20,769) test blend::screen_tiny_skia ... bench: 695,368 ns/iter (+/- 29,101) test blend::soft_light_tiny_skia ... bench: 1,109,744 ns/iter (+/- 6,441) test blend::source_atop_tiny_skia ... bench: 743,801 ns/iter (+/- 6,739) test blend::source_in_tiny_skia ... bench: 731,308 ns/iter (+/- 14,909) test blend::source_out_tiny_skia ... bench: 741,518 ns/iter (+/- 9,834) test blend::source_over_tiny_skia ... bench: 641,504 ns/iter (+/- 6,509) test blend::source_tiny_skia ... bench: 45,955 ns/iter (+/- 389) test blend::xor_tiny_skia ... bench: 736,530 ns/iter (+/- 5,528) test clip::aa_tiny_skia ... bench: 2,391,598 ns/iter (+/- 18,095) test clip::tiny_skia ... bench: 2,210,578 ns/iter (+/- 15,919) test fill::all_tiny_skia ... bench: 64,967 ns/iter (+/- 239) test fill::opaque_tiny_skia ... bench: 45,983 ns/iter (+/- 1,142) test fill::path_aa_tiny_skia ... bench: 785,691 ns/iter (+/- 11,360) test fill::rect_aa_tiny_skia ... bench: 959,141 ns/iter (+/- 9,402) test fill::rect_aa_ts_tiny_skia ... bench: 454,118 ns/iter (+/- 2,588) test fill::rect_tiny_skia ... bench: 891,035 ns/iter (+/- 9,266) test fill::source_tiny_skia ... bench: 43,363 ns/iter (+/- 327) test gradients::simple_radial_tiny_skia ... bench: 2,758,142 ns/iter (+/- 29,744) test gradients::three_stops_linear_even_tiny_skia ... bench: 2,626,101 ns/iter (+/- 22,764) test gradients::three_stops_linear_even_tiny_skia_hq ... bench: 1,743,108 ns/iter (+/- 6,058) test gradients::three_stops_linear_uneven_tiny_skia ... bench: 2,670,348 ns/iter (+/- 123,823) test gradients::three_stops_linear_uneven_tiny_skia_hq ... bench: 1,752,872 ns/iter (+/- 9,877) test gradients::two_point_radial_tiny_skia ... bench: 2,152,241 ns/iter (+/- 17,679) test gradients::two_stops_linear_pad_tiny_skia ... bench: 1,965,929 ns/iter (+/- 11,325) test gradients::two_stops_linear_pad_tiny_skia_hq ... bench: 1,235,473 ns/iter (+/- 13,458) test gradients::two_stops_linear_reflect_tiny_skia ... bench: 2,115,559 ns/iter (+/- 20,704) test gradients::two_stops_linear_reflect_tiny_skia_hq ... bench: 1,566,854 ns/iter (+/- 12,605) test gradients::two_stops_linear_repeat_tiny_skia ... bench: 2,031,556 ns/iter (+/- 19,023) test gradients::two_stops_linear_repeat_tiny_skia_hq ... bench: 1,441,551 ns/iter (+/- 9,629) test hairline::aa_tiny_skia ... bench: 3,254,202 ns/iter (+/- 28,056) test hairline::tiny_skia ... bench: 1,436,676 ns/iter (+/- 17,690) test patterns::hq_tiny_skia ... bench: 13,012,513 ns/iter (+/- 75,875) test patterns::lq_tiny_skia ... bench: 4,292,308 ns/iter (+/- 44,548) test patterns::plain_tiny_skia ... bench: 1,874,341 ns/iter (+/- 9,112) test png_io::decode_raw_rgb ... bench: 51,834 ns/iter (+/- 494) test png_io::decode_raw_rgba ... bench: 69,228 ns/iter (+/- 787) test png_io::decode_rgb ... bench: 111,547 ns/iter (+/- 532) test png_io::decode_rgba ... bench: 89,610 ns/iter (+/- 1,480) test png_io::encode_raw_rgba ... bench: 243,012 ns/iter (+/- 5,623) test png_io::encode_rgba ... bench: 284,810 ns/iter (+/- 2,298) test spiral::tiny_skia ... bench: 1,776,032 ns/iter (+/- 22,215) ```

There is no change in performance based on eyeballing it.

Fixes https://github.com/RazrFalcon/tiny-skia/issues/85 . Fixes https://github.com/RazrFalcon/tiny-skia/issues/70 . I assume this issue is fixed because of the mentioned alignment change. I couldn't run miri locally to confirm.

e00E commented 1 year ago

In addition to this, the Pixmap types should store [PremultipliedColorU8] instead of [u8] for more type safety. I didn't implement it here because it doesn't matter for correctness. It mostly involves changing some plumbing with bytemuck::cast_slice and deciding whether Pixmap should still have functions that return &[u8] or whether those should change to &[PremultipliedColorU8] making the user do the conversion with bytemuck.

RazrFalcon commented 1 year ago

So to clarify once more, this patch changes ColorU8 and PremultipliedColorU8 storage from u32 to [u8; 4]. Which resolves u32 byte-order ambiguity? No need to pack and unpack bytes. No need to worry about CPU byte-order. It's always RGBA now. Right?

And I do not think it fixes #70, because this one is about the Pixmap storage, aka Vec<u8>.

CryZe commented 1 year ago

I believe it solves #70 because from what I'm seeing the Vec<u8> never gets turned into the equivalent of a &[u32] anymore, only &[[u8; 4]] which is perfectly fine. (Unless there's some code path somewhere that still needs u32).

e00E commented 1 year ago

70 is about casting from [u8] to [u32] being UB. The reason this is UB is that the types have different alignment requirements. [u32] has stricter alignment requirements, which a cast isn't guaranteed to uphold. Now that Color is also [u8] the alignment requirement is the same so the cast can't be UB.

e00E commented 1 year ago

Which resolves u32 byte-order ambiguity?

I was wrong about this part. The byte order was always correct. I misunderstood how the packing worked. When I tested I realized it was fine. Let me close that issue immediately. This PR is still useful with regard to that, because the new code is simpler. There is no reason to pack into u32.

RazrFalcon commented 1 year ago

@CryZe @e00E Wait, &[u32] is UB, but &[[u8; 4]] isn't?! I guess I misunderstood #70. I thought it was about casting to a "wider" type.

@e00E I like the new code more, since it's simpler. The fact it was confusing to you initially is worth the change.

CryZe commented 1 year ago

It's because [u8; 4] has an alignment of 1, whereas u32 has an alignment of 4 (on most platforms). So &[u8] and &[[u8; 4]] both have the same alignment of 1.

RazrFalcon commented 1 year ago

Yes, this what confused me. I thought that &[[u8; 4]] has alignment of 4 as well... Not my area of expertise.