AOMediaCodec / libavif

libavif - Library for encoding and decoding .avif files
Other
1.57k stars 202 forks source link

png/jpg input on stdin is not supported #493

Open dbquarrel opened 3 years ago

dbquarrel commented 3 years ago

My use case is converting archived files in JPG2000 lossless format into various resolutions and image types for the web.

A lot of heavy lifting is done by imagemagick but it is slow and sometimes crufty output. Its use in the toolchain is to decode the JPG2000, crop and scale images and provide intermediate output for other tools to create the final images.

I'm just adding avifenc and am by no means expert in using this tool so I am potentially making an error.

You can pipeline all of the tools above, imagemagick, mozcjpeg, cwebp, without much issue and it solves having to create, write, read and delete temporary files during this process.

I saw avifenc has a --stdin flag and this doesn't work for png files. There are a few issues with this:

  1. documentation should be made unambiguously clear that JPG, PNG, etc. will not be processed at stdin with this flag, so that the user won't waste time trying to get it to work. It mentions y4m will be handled but doesn't mention that all other properly formed input will not.
  2. it would be nice if it DID work with this flag, instead of complaining that the input type cannot be determined (all of these images have headers that make it easy to determine what they are, or a flag could be added for the user to describe input type)

Also in general just guessing input type based on the filename isn't correct.

Initial byte sequences to identify JPG:

Initial byte sequences to identify PNG:

These are probably available in the jpg and png libraries to check, I am not familiar with them though and not really so good with git to submit my own fixes to these.

I wrote some code that will check the file type on stdin and switch to the PNG or JPEG readers if it detects these files on stdin, and if not recognizing those will fall back into the existing method. So it should work reasonably, expanding the types accepted with --stdin by adding these two.

The results then look like this:

% ./avifenc --stdin -o test1.avif < /tmp/test.png
stdin file type checker determined: 3
Successfully loaded: (stdin)
AVIF to be written: (Lossy)
 * Resolution     : 228x700
 * Bit Depth      : 8
 * Format         : YUV444
 * Alpha          : Not premultiplied
 * Range          : Full
 * Color Primaries: 1
 * Transfer Char. : 13
 * Matrix Coeffs. : 6
 * ICC Profile    : Absent (0 bytes)
 * XMP Metadata   : Absent (0 bytes)
 * EXIF Metadata  : Absent (0 bytes)
 * Transformations: None
Encoding with AV1 codec 'aom' speed [6], color QP [24 (Medium) <-> 26 (Medium)], alpha QP [0 (Lossless) <-> 0 (Lossless)], tileRowsLog2 [0], tileColsLog2 [0], 1 worker thread(s), please wait...
 * Encoding frame 1 [1/1 ts]: (stdin)
Encoded successfully.
 * Color AV1 total size: 6706 bytes
 * Alpha AV1 total size: 42 bytes
Wrote AVIF: test1.avif

The method for accomplishing this is fairly trivial...

  1. supply and swizzle a buffer to stdin before doing an I/O
  2. determine which magic numbers you want to use to check (JPG has one two parter, my code handles two part magic numbers)
  3. use the swizzled buffer to scan for magic numbers and find a hit if possible
  4. if any hits, switch to the appropriate input handling routine

So some code has been added to avifenc.c to do this... one line of code each is changed in avifpng.c and avifjpg.c, to use stdin if the filename passed in was null (instead of opening a file).

I think there is also an error in how avifenc.c was handling determination of end of file, using fgetc/ungetc ... the character returned by fgetc will be EOF if it is at end of file, and popping, pushing this character back into stdin will not necessarily yield a positive return value for feof() afterwards (it wasn't working right after swizzling the input buffer).

I'm not super great with git but will try to submit a pull request and see if I can put the changes in.

joedrago commented 3 years ago

documentation should be made unambiguously clear that JPG, PNG, etc. will not be processed at stdin with this flag, so that the user won't waste time trying to get it to work. It mentions y4m will be handled but doesn't mention that all other properly formed input will not.

--stdin : Read y4m frames from stdin instead of files; no input filenames allowed, must set before offering output filename

This says "Read y4m frames" right in the syntax... I'm not sure how much more unambiguous it could be. The entire existence of this path is to help people that are trying to pump video content into avifenc from things like ffmpeg or avisynth's pipe mods. It seemed like overdesign to allow a single JPEG or PNG to be fed in this way as you already have the single image on disk and can simply supply it on the commandline.

That said, I'm not opposed to such support, although I don't have time to put it in myself. I could see supporting a single JPEG or PNG from stdin (avifGuessFileFormat() already does the header magic detection you suggest), but it'd just be a single input file. Mind you, this would only buy you the avoidance of a single temporary file on disk; your pipeline would still be encoding these PNGs/JPEGs before handing them over, etc. I'm struggling with the use case here, but I'll take your word for it that'd it'd be beneficial.

dbquarrel commented 3 years ago

This says "Read y4m frames" right in the syntax... I'm not sure how much more unambiguous it could be.

The problem is that you take 3 data types as input and 2 of those require files while one works on pipelines. This is non-intuitive and it is contrary to both UNIX tool philosophy and how image processing tools like cjpeg, cwebp, imagemagick, etc. all operate.

So the two problems of breaking UNIX toolset philosophy and non-intuitive handling of input data, generated a suggestion that this should specify what it can't do as well as what it can do when you say --stdin. Bear in mind what makes sense to you personally may not be intuitive for someone who is starting to use the program especially when it has this kind of flaw in how it handles data.

For the rest, about your code in avifGuessFileFormat(), that code doesn't work properly because it opens a file again to do what it does, and data being piped to the program on stdin cannot be detected in this manner unless you rearchitect how you're handling your data. You cannot fopen() stdin and you can't seek, even the fgets() / fputs() calls you make are potentially problematic but at best they are good for one character on stdin.

What you need to do for a correct implementation is:

  1. read header info
  2. make a determination from the information about how to process it
  3. provide header and the rest of the data to the rest of the program to process in the correct manner

Your program is designed in a way that would require the addition of flags like --stdin-png and --stdin-jpg because it routes the data to processing before knowing what the data is.

Otherwise what it needs is what I demonstrated for you which is to analyze the data first and make decisions about what processing should follow and then do the correct processing. It doesn't force the user to supply redundant information via flags nor does it make assumptions about what data is on stdin as it is currently doing. The only way to do that without changing all of the underlying infrastructure is to supply your own buffer to stdin so you can check what is there without calling fread() on it, which will consume data and the rest of the program will not receive everything it needs to process.

Your tool works nicely for people who want to do image processing and use it as a back end to create avif files. But it's flawed because it doesn't respect UNIX tool design philosophy and it should. Part of that philosophy means the developer shouldn't be imagining limited use cases for the tool and designing for those limited use cases. You should make the tool function properly as a peer and respect use of pipes in UNIX and then let the users of your tool decide how they want to apply the features. That's the whole point, to be flexible, simple, intuitive and play nicely with other programs and not design for a single use case.

For me, this tool has one use which is to generate avif image files from other image files in a processing pipeline. So how I would like to use it is not how you would like me to use it, but if it is designed right then you have something that will be used in ways you didn't imagine (which is again, the idea behind UNIX tool design in general).

If other tools refused to read properly from stdin then making a pipeline would involve large numbers of temporary files in order to feed everyone what they want. The entire concept of the pipeline is to allow people to chain tools in ways that the designers did not predict. It's a necessary feature for an image processing tool.

In the end these are just suggestions for you to consider.