SixLabors / ImageSharp

:camera: A modern, cross-platform, 2D Graphics library for .NET
https://sixlabors.com/products/imagesharp/
Other
7.31k stars 846 forks source link

Add support for HEIF based images #2633

Open ynse01 opened 7 months ago

ynse01 commented 7 months ago

Prerequisites

Description

This PR is an attempt to start with #1320

Implemented decoding and encoding of images based on a HEIF (ISO/IEC 23008-12:2022) container. These include (amongst others): HEIC, HIF and AVIF.

Please note this PR does NOT add any new compression algorithm to ImageSharp. For now it will take a JPEG thumbnail as its pixel source. Needless to say, much more work is needed after this PR to reach the goal. I'm try to help there also.

Please do let me know any comments, also on whether to support the HEIF file format in the first place.

JimBobSquarePants commented 7 months ago

Oh wow! Thanks! 🙏 I’ll have a good dig through asap!!

dlemstra commented 7 months ago

I noticed that you used the x265 encoder. This is licensed under the GPL and cannot be used in this project.

ynse01 commented 7 months ago

I noticed that you used the x265 encoder. This is licensed under the GPL and cannot be used in this project.

True, and that is why I used the Nokia repository as reference for my implementation, which has a license which I guess is compatible with ImageSharp. I did not start on writing an HVC Codec, due to patent concerns.

Nokia also includes a Patent waiver in its license. AVIF is supposed to be patent free, even its Codec. For AVIF I used SVT-AV1 as a reference, which has a 3-clause BSD license.

With all this, I hope the reference implementations that are mentioned now, can be used to create an AVIF Codec implementation in ImageSharp.

ynse01 commented 5 months ago

Not sure why one of the RowIntervalTests is failing, didn't touched that part of the code...

ynse01 commented 4 months ago

@JimBobSquarePants Let me know if you still want to go ahead with this PR.

JimBobSquarePants commented 4 months ago

@JimBobSquarePants Let me know if you still want to go ahead with this PR.

Hi @ynse01 I absolutely do want to continue but we'll need to get AV1 format together before I can merge the code. Thanks!

ynse01 commented 4 months ago

@JimBobSquarePants Let me know if you still want to go ahead with this PR.

Hi @ynse01 I absolutely do want to continue but we'll need to get AV1 format together before I can merge the code. Thanks!

AV1 is quite complex with all its options, especially for implementing an efficient Encoder.

Would it be easier for you, if we focus first on getting the pure HEIF container support in, including all metadata (ICC, Exif and XMP) working ? AV1 compression could than be a separate next step.

JimBobSquarePants commented 4 months ago

My concern here would be merging an incomplete format. We’ll have to make all the parts internal if you want to do things in that order.

ynse01 commented 4 months ago

I think I understand what you're trying to tell me: From a user perspective, parsing only metadata out of an image is not considered supporting a format. Users expect pixels out of an image. Which would mean I guess, that the minimal viable PR to be merged in would be something like a first image round-trip, using 1 chosen set of parameters. That would allow for adding the other (many) variations relatively quickly and keep up with user's expectations.

The downside I see with this approach, is that the internal design of the implementation doesn't get any feedback until the very end, making design changes a bit more painful. Not to mention the size of the first PR ...

Now that I know you would like to have the AVIF format supported, I'm motivated again to continue to implement it.

@JimBobSquarePants Let me know your thoughts on the approach, order and intermediate goals of AVIF support.

JimBobSquarePants commented 4 months ago

I think I understand what you're trying to tell me: From a user perspective, parsing only metadata out of an image is not considered supporting a format. Users expect pixels out of an image. Which would mean I guess, that the minimal viable PR to be merged in would be something like a first image round-trip, using 1 chosen set of parameters. That would allow for adding the other (many) variations relatively quickly and keep up with user's expectations.

The downside I see with this approach, is that the internal design of the implementation doesn't get any feedback until the very end, making design changes a bit more painful. Not to mention the size of the first PR ...

Now that I know you would like to have the AVIF format supported, I'm motivated again to continue to implement it.

@JimBobSquarePants Let me know your thoughts on the approach, order and intermediate goals of AVIF support.

Yes exactly. If we ship with support for a format, it must support the full decode/encode process for one compression format. Our users expect that now as that's how we've always worked.

I would implement a single compression (the most common if you know that) algorithm then iterate once we know that is complete. Addition compression algorithms can be separate PRs.

Regarding reviewing the code. This is why it's good to have the PR open early so we can talk through changes as they're written. I try to read changes as soon as they appear in PRs but if there's ever anything you explicitly want me to look at just tag me.

I must say though, the quality of code you are delivering is very high!!

ynse01 commented 4 months ago

Sounds like a plan !

The user community of AVIF encoders seems to be quite scattered, can't identify a dominant force there. Feel free to provide real-life images, so I can direct the support towards these.

For now, I plan for the following initial limitations to AV1 compression:

I do plan to take both lossy and lossless into account from the start.

@JimBobSquarePants In the meantime, you could help my local workflow, by adding the relevant file extensions to GIT LFS in Shared-infrastructure please:

*.heic              filter=lfs diff=lfs merge=lfs -text
*.hif               filter=lfs diff=lfs merge=lfs -text
*.avif              filter=lfs diff=lfs merge=lfs -text
JimBobSquarePants commented 4 months ago

Sounds like a plan !

The user community of AVIF encoders seems to be quite scattered, can't identify a dominant force there. Feel free to provide real-life images, so I can direct the support towards these.

For now, I plan for the following initial limitations to AV1 compression:

  • 8 bit depth only
  • Color pixels only
  • Base Profile for now
  • Square block sizes
  • Split partition type only, to keep the blocks square
  • DCT transform type only
  • Limited selection of Prediction modes
  • Limited subsampling options
  • No filters
  • No alpha
  • No sequences / animation

I do plan to take both lossy and lossless into account from the start.

@JimBobSquarePants In the meantime, you could help my local workflow, by adding the relevant file extensions to GIT LFS in Shared-infrastructure please:

*.heic              filter=lfs diff=lfs merge=lfs -text
*.hif               filter=lfs diff=lfs merge=lfs -text
*.avif              filter=lfs diff=lfs merge=lfs -text

That all sounds great. Sorry about the delay, I've been very swamped.

If you use the following command in your fork you will get those changes.

git submodule foreach git pull origin main
brianpopow commented 2 months ago

@ynse01 The Av1BitStreamReader seems to work as it should, I was able to parse Av1CodecConfiguration with it d8e7078b. Maybe I can help with this PR, but I still in the process of reading the spec, there is a lot to grasp.

ynse01 commented 2 months ago

@ynse01 The Av1BitStreamReader seems to work as it should, I was able to parse Av1CodecConfiguration with it d8e7078. Maybe I can help with this PR, but I still in the process of reading the spec, there is a lot to grasp.

There sure is a lot of grasp for AVIF.

The parsing of ObuSequenceHeader seems to work OK indeed, but somewhere in either ObuFrameHeader or in the handling of TileGroups there is a bug. The way AVIF (or technically Av1) intertwines BitStream coding with Symbol coding is pretty hard to debug. Merged in my latest changes even though some tests are still failing, as I did some minor refactoring.

ynse01 commented 2 months ago

I fixed the bug :-) Tests are passing again.

@brianpopow There are a few TODO items in the Heif container area, if you like you could have a look at:

I'll dive deeper into the Symbol coding and parsing the coefficients of the TileInfoBlocks.

brianpopow commented 2 months ago

I fixed the bug :-) Tests are passing again.

@brianpopow There are a few TODO items in the Heif container area, if you like you could have a look at: ... I'll dive deeper into the Symbol coding and parsing the coefficients of the TileInfoBlocks.

Good job in finding the bug! I had some time on the weekend too look into your code. It's impressive howmuch is already done with the OBU reader. I thought maybe I could start with something small, maybe implementing parsing 5.9.14. Segmentation params as a start?

ynse01 commented 2 months ago

I fixed the bug :-) Tests are passing again. @brianpopow There are a few TODO items in the Heif container area, if you like you could have a look at: ... I'll dive deeper into the Symbol coding and parsing the coefficients of the TileInfoBlocks.

Good job in finding the bug! I had some time on the weekend too look into your code. It's impressive howmuch is already done with the OBU reader. I thought maybe I could start with something small, maybe implementing parsing 5.9.14. Segmentation params as a start?

Great ! I see the implement-another-spec-section-virus has gotten to you also. If you like more of these, feel free to do so. As a goal, we could support images that do not use the reduced still picture header feature and explicitly contain every OBU module out there. The test image _IrvineCA.avif is such an example, in ObuFrameHeaderTest you can uncomment it and work your way through the the OBU headers.

ynse01 commented 1 month ago

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's. From Residual and down, I don't think the spec and the code map 1-to-1. As the SVT-AV1 library has extensive intrinsic parts, I would like to leverage those and stick close to their implementation.

brianpopow commented 1 month ago

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's. From Residual and down, I don't think the spec and the code map 1-to-1. As the SVT-AV1 library has extensive intrinsic parts, I would like to leverage those and stick close to their implementation.

OK, thanks for the update, I was following the spec very closely. I will take a look into SVT-AV1 library. I have seen that I have implemented GetPlaneResidualSize() in 13173a39, which you already have done in a another place, that was a waste of time, I have overlooked that.

brianpopow commented 1 month ago

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's.

I tried to look into the SVT-AV1 repository, but I have some understanding issues, @ynse01 maybe you can help me understand what you are using as the reference code base. Just for clarification, when you say SVT-AV1 library you mean this repo: https://gitlab.com/AOMediaCodec/SVT-AV1 right? If so, I only see an encoder here, where is the decoder part?

ynse01 commented 1 month ago

@brianpopow I'm following the code path of the SVT-AV1 library, which seems to deviate quite a bit in the latest Syntax's.

I tried to look into the SVT-AV1 repository, but I have some understanding issues, @ynse01 maybe you can help me understand what you are using as the reference code base. Just for clarification, when you say SVT-AV1 library you mean this repo: https://gitlab.com/AOMediaCodec/SVT-AV1 right? If so, I only see an encoder here, where is the decoder part?

Indeed, that is the repository that I've used as reference. The decoder has been removed recently, in this commit. As I've used an earlier version of that exact decoder as the basis for my code here. This is a bit of a worry from a maintenance point of view.

The SVT-AV1 library is still one of the more efficient encoders out there and its license is permissive. Hopefully we can learn a couple of tricks from them on the encoder.

brianpopow commented 1 month ago

Hi @ynse01, I still have some problems understanding howto decode a single image with the SVT-AV1 library.

I am using now the tagged version v2.1.0. I can see now there is an SvtAv1DecApp, but that seems to be for video files, if I am not mistaken. Is it not possible to decode just a single avif file? This would make comparing our implementation with the reference much easier, but I struggle to understand how this can be done.

ynse01 commented 1 month ago

@brianpopow Running the decoder app on an AVIF file will not work. I don't think SVT-AV1 has the HEIF container parsing capabilities. It seems to use the IVF file format, but it supports more, see here. It appears to me that IVF can contain an AV1 bitstream although its meant for VP8. And yes, I know this whole container / bitstream and their naming is pretty confusing.

To summarize the terminology: Heif is the ISO base container (also known as ISOBMFF), where AV1 is just one of the supported bitstream formats. AVIF is to be interpreted here as an AV1 bitstream inside a Heif container, where the bitstream contains exactly one intra frame. With the same terminology, Section 1 of AVIF spec also touched on this briefly.

The workflow I used to "debug" SVT-AV1 is considerably less user friendly from what you suggest. I mainly used modified unit tests from their own test suite, to get the information that I need. Alternatively, when you convert the AVIF file into IVF, you could get it visualized in a debugger.

brianpopow commented 1 month ago

@ynse01 I have noticed an issue in the bitstream reader in ReadLiteral(). When you read to the end of the stream the byte position would be advanced, resulting in an Index out of range issue. I have added a test demonstrating the issue: 62728b6f

Also when there is only 4 bytes of data available, the initialization would also result in an index out of range issue, because the AdvanceToNextWord would be called twice in the constructor. I did noticed this when decoding example image jpeg444_xnconvert.avif when av1CodecConfiguration was parsed in HeifDecoderCore -> ParsePropertyContainer()

I have tried to adjust the ReadLiteral() implementation to fix that, but I did not want to override your code without asking for your opinion first. I have added a new Av1BitStreamReader2 to demonstrate this. It's adapted from libgav1, because I found the implementation of Svt-AV1 too complicated.

All Bitstream tests pass with that implementation.

ynse01 commented 1 month ago

@brianpopow I like your new Av1BitStreamReader, for 3 reasons:

So, replaced the implementation. I did rework your suggested code a bit and implemented switching to Symbol reading again.

JimBobSquarePants commented 3 weeks ago

@ynse01 Just a heads up. I'm going to be merging PRs #2751 #2762 very soon which contain some breaking changes to our codec APIs. If you have any questions about how to correctly implement the changes following merge, please let me know.