Tighten image MIME-Type determination

lexaknyazev commented 3 years ago

What is the authoritative source of truth for image media types?

There could be up to three possible sources for the same image:

Matching the actual data stream.
Looking at the "transport layer":
- HTTP Content-Type header;
- Metadata provided by file system.
- <mediatype> field embedded in Data URI;
image.mimeType property in glTF JSON.

As per glTF 2.0 spec, these combinations are possible:

Regular URI
- Matching + HTTP / FS (optional) + image.mimeType (optional)
Data URI
- Matching + <mediatype> + image.mimeType (optional)
Buffer View
- Matching + image.mimeType (required)

We should define steps similar to MIME Sniffing Standard, Section 7.

javagl commented 3 years ago

That can indeed be tricky (and I remember stumbling over that when writing the respective IO classes).

I think that the following cases are already sufficiently specified:

Buffer View: The image.mimeType is required in this case. I don't see how one could argue that another way of determining the type should be used in addition to that. And if it was done, it would raise the question of what should happen if the mimeType is image/jpeg but the magic header of the actual data indicates PNG. One could consider to explicitly say how loaders should behave in this case. But this has to be worded carefully - otherwise, it may become necessary for loaders to check whether the mimeType and the data type are the same. Otherwise, I'd say that loaders can expect the mimeType to be present and correct. (I'm sure that there are many image loader libraries where the caller just throws in some raw byte[] data, and the loader library itself does the matching, not necessarily returning information about which file type it found...)
Data URI: Similar to the above: I think that people must be able to rely on the correctness of the <mediatype>. One could say that in this case, the image.mimeType MUST be ignored, to leave less room for inconsistencies.
Regular URI: That's the most critical one. In general, I don't like the idea that determining the image media type depends on the transport layer. The asset should be "self-contained", so to speak, and it should be possible to interpret it correctly without relying on external sources of information. This means that the only option to determine the type is by Matching. Of course, in practice, I think that every loader will at least look at the file extension, and hopefully handle the (unusual) case that a image UI is given without a file extension. If some glTF loader is ~"tightly coupled" with the transport layer, and has the request/response at hand, it might also examine that. But on a certain layer of abstraction (and when not relying on the transport layer), every loader must eventually be able to determine the type by Matching.

When you mention that there should be "steps", like the ones that you linked to: I think that it could make sense to add some further information (e.g. that the mediatype takes precedence over a possible mimeType). But I'd hesitate to add anythning that forces clients/implementors to examine the transport layer...

lexaknyazev commented 3 years ago

what should happen if the mimeType is image/jpeg but the magic header of the actual data indicates PNG

That's exactly the open question for the buffer view case. Some applications are very sensitive about relying on large third-party dependencies. When seeing image/jpeg they may immediately send the payload (which is accidentally PNG) to a dedicated JPEG-only decoder.

I think that people must be able to rely on the correctness of the <mediatype>.

Semantically, this is transport layer information since Data URI could be seen as its own "protocol".

I think that every loader will at least look at the file extension

The spec explicitly says to not rely on file extensions. Some sandboxed or virtualized execution environments may signal the OS-detected media type while hiding the actual filesystem (including on-disk file names) from the application. The WHATWG spec defines a broad term "supplied MIME type".

... has the request/response at hand ...

Even web-based loaders can access response HTTP headers when fetching network resources programmatically.

But I'd hesitate to add anything that forces clients/implementors to examine the transport layer...

The goal is to define asset's validity when some of these fields do not match. In some cases (e.g. a regular local filesystem) the transport layer consists only of the file extension that should be ignored anyway.

javagl commented 3 years ago

Semantically, this is transport layer information since Data URI could be seen as its own "protocol".

Semantically yes, but I think the point of whether the asset itself contains all information or whether "external" information has to be gathered is the critical one here.

The spec explicitly says to not rely on file extensions

There's a strong case to make that loaders should be able to detect the type of a file without an extension. But I think that it is reasonable to look at the extension, even though, of course, you're right in that somebody could store a PNG file as fooledYou.jpg and break that loader. (Beyond that: there may be details e.g. regarding the virtualized FS that you mentioned that I'm not fully aware of)

BTW: The spec links to https://mimesniff.spec.whatwg.org/#matching-an-image-type-pattern at that point, which covers the usual, "browser-readable" image file formats. How would things like KTX fall into that? (This may already be covered by the extension - some early discussion was at https://github.com/KhronosGroup/glTF/issues/835 , but sorry, I've lost track of some of the more technical parts of the recent developments here...)

When seeing image/jpeg they may immediately send the payload (which is accidentally PNG) to a dedicated JPEG-only decoder. ... The goal is to define asset's validity when some of these fields do not match.

I think that assuming correctness of the field should be fine. Or conversely, if the mimeType and the data do not match, it's a clear inconsistency, and could be considered to just be invalid. But if I understood this correctly, then you argue about detecting this case, and maybe differentiating this further. In this case, I wonder about the practical difference between 1. image data that does not match the mime type and 2. image data containing complete garbage.

More generally: I'd agree that it could make sense to make the spec and requirements more strict, for example, in saying

loaders MUST NOT rely on the file extension, they MUST check the image type with MIME sniffing (TBD: KTX), they MUST report an inconsistent mimeType and image data as an error

or similar. But I cannot imagine how loaders should be forced to examine actual file system, OS media type or HTTP header information. If someone has an idea how this could be part of the spec, in a way that can reasonably be supported by and mapped to all environments (ranging from a C application in a virtualized Linux, to a JavaScript viewer running in Chrome on Android), I'd be curious about that.

lexaknyazev commented 3 years ago

How would things like KTX fall into that?

Detecting KTX2 is as simple as reading the first 12 bytes.

In this case, I wonder about the practical difference between 1. image data that does not match the mime type and 2. image data containing complete garbage.

In some environments (e.g. Web/JS), the underlying platform (e.g. a browser) handles PNG and JPEG decoding. That platform performs sniffing on its own and gives no control to the app, so mismatched mimeType will likely be ignored. In other environments, it's the app that chooses what decoder to use. Ideally, the same glTF asset should behave the same in both cases.

But I cannot imagine how loaders should be forced to examine actual file system, OS media type or HTTP header information.

The "supplied type" in the right umbrella term here. It may be omitted in some cases. We may also say that it's always ignored. In such a case, should an application accept image data that is marked as e.g. text/plain in HTTP headers?

lexaknyazev commented 3 years ago

Consider the following valid asset:

{
    "asset": {
        "version": "2.0"
    },
    "images": [
        {
            "uri": "2c5641b0-bdc1-472d-b1ce-607ff93c6ed6"
        }
    ]
}

By definition, image.uri without a scheme is relative, so it can be fetched from the current asset location.

For a regular file-system access, there would be probably no hints about the image file format.
For HTTP, the header may be application/octet-stream, an actual image/* type, or something else. It depends on server's configuration.

According to the linked WHATWG spec, web-browsers will use specific steps to determine the file type so the asset will likely load correctly (if it's JPEG or PNG). How should non-web loaders handle this?

lexaknyazev commented 3 years ago

A modern JavaScript implementation would do something like:


fetch("2c5641b0-bdc1-472d-b1ce-607ff93c6ed6")
.then(response => response.blob())
.then(function(myBlob) {
  // if the reported type is image/png, image/jpeg, or image/webp, let the browser handle that
  if (isNativelySupported(myBlob.type)) {
    createImageBitmap(myBlob);
  } else {
    // try matching the first few bytes...
  }
});

lexaknyazev commented 3 years ago

// if the reported type is image/png, image/jpeg, or image/webp, let the browser handle that

This btw does not catch an edge case when a misconfigured server reports image/png for a KTX2 file.

lexaknyazev commented 3 years ago

I think here's the first rule we can agree on

image.mimeType (when present) MUST match the media type of the image payload.

What should implementations do in a case of mismatch: reject the image completely or try to load anyway? This could also be left undefined.

javagl commented 3 years ago

Detecting KTX2 is as simple as reading the first 12 bytes.

I had no doubt that this is possible. The point is that it is not covered by the linked site. The spec currently only says that image/jpeg and image/png are supported, and these are covered by the link. But again, apologies when I'm not entirely up do date: Maybe the relevant specs already define a MIME type for KTX, and the matching that as to be done against its first 12 bytes, and it would be possible to link to the latter here.

The "supplied type" in the right umbrella term here.

That's fine. My main concern was to not accidentally require some externally "supplied type". As such, it certainly makes sense to define these "steps" as a means to assign "priorities" to all the possible sources of information, explicitly covering the case that they may not be available, leading to the ultimate fallback that you aimed at with the example asset: The final Truth® is in the payload.

Then, a crucial question is then indeed how to handle inconsistencies. On the one hand, giving applications leeway for handling these cases resiliently is desirable. On the other hand, it can lead to feature-creep. But my gut feeling is that it should be OK to leave the behavior undefined - although I'd prefer to be more explicit, and say: Applications MAY rely on <certain information like the mimeType> to be correct (meaning that they MAY reject the image when they encounter an inconsistency with the actual payload), but they also MAY load the actual payload nevertheless. Requiring the application to do some sort of consistency check can be very difficult, as you showed in your isNativelySupported example.

lexaknyazev commented 3 years ago

Maybe the relevant specs already define a MIME type for KTX, and the matching that as to be done against its first 12 bytes, and it would be possible to link to the latter here.

We have registered image/ktx2 media type with IANA.

My main concern was to not accidentally require some externally "supplied type".

Sure, it cannot be required as in some cases (like a file without an extension from a disk), it just doesn't exist.

Another context that is missing from this thread is that an asset itself provides some context for the "expected" type. Consider the following:

{
  "textures": [
    {
      "source": 0,
      "extensions": {
        "EXT_texture_webp": {
          "source": 1
        }
      }
    }
  ],
  "images": [
    {
      "uri": "image.png"
    },
    {
      "uri": "image.webp"
    }
  ]
}

Here, the images[0] is referred by the core texture.source property so it could only be JPEG or PNG. At the same time, images[1] is referred by a EXT_texture.webp.source that must refer to a WebP resource.

lexaknyazev commented 3 years ago

it should be OK to leave the behavior undefined

Being a file format, glTF cannot enforce any application behavior anyway. We can only normatively define asset validity and provide non-normative suggestions for implementations.

javagl commented 3 years ago

Sure, and the quesiton about "validity" (as in "can be checked by the validator") is another reason for why one should be careful with ~"requiring external sources of information". I don't know how in detail how resources are resolved when a gltf file and its required png file are dragged-and-dropped into the validator. Quickly looking over the code shows, for example MIME type from the JSON is not overridden by the "actual" type from a data URI, which may be one specific instance of a "priority" that could be encoded in the spec. Brutally changing a Data URI type from png to jpeg or declaring the mimeType to be image/jpeg (despite having data:image/png in the data URI) both cause an error, "Recognized image format 'image/png' does not match declared image format 'image/jpeg'", and I assume that, depending on the decisions here, the latter (or even both?) might become a "Warning" in the future.

I think that the steps that could be mentioned in the spec could largely be derived from the points that you mentioned in the first comments, differentiating the "combinations" with careful wording (" .... iff <information> is available..." etc.). (edit: and adding careful disclaimers for extensions, like the webp example).

The question of which combinations should officially be declared as "invalid" remains open. ~~I think that considering the assets as "valid (with warning)" should be fine, though. It would be great if others would chime in here.~~ (edit <- That was a bad idea - see my next comment)

As mentioned earlier, I remember some struggle here, eventually settling for a "guessing" approach that tries

The Data URI data type
If this is not present: Checks the file extension (yeah, that shouldn't be done...)
If this is not present: Uses the built-in image infrastructure that can check the payload and determine the type

But all this is still independent of loading the actual data. There are some differences depending on the platform (Desktop vs. Android), but eventually, it's just throwing the payload into the built-in image loaders and says "Good luck with loading that", ignoring any additional information anyhow....

lexaknyazev commented 3 years ago

Uses the built-in image infrastructure...

That's where KTX2 (or WebP) may fail. An engine that eagerly loads all images should detect each media type from the actual data before throwing it into a platform-default image loader. A safer approach would be to load only those images that are referred by specific texture objects. Seeing that a particular image is used by EXT_texture_webp.source greatly limits the choice of potential image decoders. In any case, I don't see a single reason why image.mimeType (when present) should not be enforced to match the actual payload type.

lexaknyazev commented 3 years ago

The next immediate question is what is invalid in this asset:

{
  "images": [
    {
      "mimeType": "image/png",
      "uri": "webp.bin"
    }
  ],
  "textures": [
    {
      "source": 0
    }
  ]
}

The image is explicitly declared as PNG. Assuming that the actual data is encoded as WebP, is texture.source reference valid here? If the presence of image.mimeType does not affect the correctness of the texture.source reference, the former could always be ignored in favor of checking binary headers.

There's a case for skipping binary matching though. In some environments, an application may simply pass an URI to the platform and wait for decoded pixels, not touching encoded bytes at all.

javagl commented 3 years ago

Seeing that a particular image is used by EXT_texture_webp.source greatly limits the choice of potential image decoders.

That may be true for one specific extension (and I've seen that the validator already covers the webp extension). But trying to address something like this generically, on the spec level, would be really difficult. (At least, I don't see a point in statements like "If some extension has additional type information, then this may be used, and if it has requirements, they may be enforced"....)

Of course, you'll still have to come up with an answer about how the validator should behave - but maybe that can be sorted out:

Regarding the validity of the second example: I'm a bit surprised that you ask about the validity of the texture.source reference in particular. I think the inconsistency is not related to the place where this image is used. I.e. I don't understand the role of the "textures" part in terms of validity: Could the result of any validation be different depending on whether the "textures" usage is present or not? ^{(Of course, this does not refer to 'unused elements' in general - only to the question of whether the image itself is valid)}

And regarding the general behavior of the validator, I said earlier

I think that considering the assets as "valid (with warning)" should be fine, though.

But this was a bad idea: If the validator reported that as "valid (with warnings)", then there may be clients that rely on the mimeType and fail to load such a (valid) asset. So the validator should report that as an error (as it does now). So it would probably make more sense when

the validator reports that as an error
the spec says that an asset is valid if the mimeType and the payload match

This way, a "valid" asset is always reliable: Clients can rely on the mimeType, and load the data without further inspection. If the data has a different type, then they are still free to load it and ignore this inconsistency (and that could also be said in the spec), but they are not required to be able to cope with that).

(It's really hard (for me, right now) to come to a consistent conclusion, even for the seemingly "simple" case that does not take extensions or a "supplied type" into account...)

lexaknyazev commented 3 years ago

But trying to address something like this generically, on the spec level, would be really difficult.

The core spec limits image formats to JPEG and PNG. Anything else requires an extension.

I'm a bit surprised that you ask about the validity of the texture.source reference in particular.

texture.source may point only to JPEG or PNG. EXT_texture_webp.source may point only to WebP. The example above highlights the ambiguity.

javagl commented 3 years ago

OK, I the last one confused me, because there are two concerns mixed: 1. The mimeType does not match the data and 2. the actual type of the image is not valid for a standard texture.source.

I think that I saw something in the validator that was along the lines of this pseudocode

validMimeTypes = { "image/jpeg", "image/png" }
if (asset.uses("EXT_texture_webp")) validMimeTypes.add("image/webp");
foreach (image in asset) validate(validMimeTypes.contains(image.mimeType));

to handle this case.

But in some way, this is related to a far more abstract question: The standard currently specifies JPG and PNG as valid MIME types. Do we have a general rule/consensus about how extensions may modify such rules? I mean, not only referring to that particular example, but maybe also to some ACME_allow_negative_byte_offset_in_accessor extension...

(Sorry, my contributions here may not be overly constructive right now, but I hope that some of the questions make it possible to carve out the caveats and options, and maybe others can chime in with their thoughts. Making a decision that is sure to not have any negative side-effects is hard...)

lexaknyazev commented 3 years ago

Do we have a general rule/consensus about how extensions may modify such rules?

We should formalize how WebP and BasisU extensions do that: by expanding the set of allowed image types and referring to them only from the appropriate extension objects.

javagl commented 3 years ago

I'll have to think about whether this will always be applicable, e.g. for an extension that changes the ranges of allowed values (e.g. allows negative indices, or a value that originally was [0,1] to be [-1,1]` or whatnot). I'll try to focus on the original issue here for now.

So to give it a stab, the steps could therefore roughly be (based on your initial post, but to be fleshed out in terms of language and details)

Always:
- Clients can inspect the header of the binary data, and load it.
For Buffer View;
- Clients can inspect the image.mimeType and load the binary data, assuming that it has this type
For a Data URI:
- If the image.mimeType is present, clients may inspect that and load the binary data, assuming that it has this type
- Clients can inspect the <mediatype> and load the binary data, assuming that it has this type
In all cases:
- If the image.mimeType is present, or the <mediatype> is present, and any of them does not match the type of the binary data, then clients MAY refuse to load the asset

The validator should basically check whether all available information is consistent, and declare it as an Error if this is not the case. But there could be an implementation note, saying that that clients are allowed to ignore each part of the information and still MAY load the asset.

Two things that are not covered with this:

The role of "transport layer" information (specifically the "external" one, like HTTP response headers - i.e. not the <mediatype> of a data URI). One could say: "For normal URIs, clients MAY use this external information, and load the data, assuming that it has this type", but this could lead to a state where a valid asset cannot be loaded due to some misconfigured HTTP server. Although one can shift the responsibility to the server admin, I'm not sure whether this is desired.
The role of extensions. As mentioned earlier: It's hard to imagine how to include that in the core spec. Pro-actively unfolding the implications that any extension might have will turn a simple statement like "Clients can inspect the header of the binary data, and load it" into "Clients can inspect the header, and if it is PNG or JPG, they can load it, but if there is an extension that extends the allowed MIME types from the core spec for the referenced image, and the binary header matches the MIME type that is allowed via the extension, then clients can load it" and that would still be only one simple case that we already know of.

lexaknyazev commented 3 years ago

Although one can shift the responsibility to the server admin, I'm not sure whether this is desired.

Misconfigured servers may cause all kinds of loading issues (e.g. they may prevent streaming WebAssembly instantiation) but they should not affect asset's validity. It's fair to expect that servers would mark image formats they don't know as application/octet-stream. The latter should not cause any client-side issues.

that would still be only one simple case that we already know of

Two, since we also have KHR_texture_basisu that uses similar design. Besides, our language around extensions (we should relocate it to the spec, btw) explicitly mentions that image MIME types are extendable.

lexaknyazev commented 3 years ago

After more offline discussions, here is the set of rules we propose:

Validation:

If image.mimeType is present, the bitstream must match it.
If Data URI bitstream does not agree with its mediatype, the URI is invalid.
Image objects that contain only an external URI require content sniffing.

Suggested runtime operations for valid assets:

Rely on image.mimeType when it's present.
Rely on Data URI's mediatype.
Sniff contents of external references.

lexaknyazev commented 3 years ago

Addressed in #1997.

KhronosGroup / glTF

Tighten image MIME-Type determination #1966