Clarification: Texture Image Orientation

mlimper commented 7 years ago

Hi,

we are currently running into an issue with texture image orientation.

The glTF 2.0 spec says the following: First image pixel (UV coordinates origin) corresponds to the upper left corner of the image. Implementation Note: OpenGL-based implementations must flip Y axis to achieve correct texture sampling.

So, just to get this straight: This means that exporters should store texture images in the typical standard orientation where the lower left corner of the image corresponds to the lower left corner of the UV space - correct? I am asking because this is what usually requires the flip, which the spec expects from the GL implementation.

Here's a simple (Collada) example loaded in MeshLab:

The image is loaded into the UV space "as-is". Meaning, the eye of the duck is still in the upper right corner, as it was in the original image. In this case, the OpenGL-based implementation must flip.

The issue we're observing right now is that exporters start to flip images, or UV coordinates, because runtimes seem to expect them in a non-standard (? at least uncommon) orientation. See, for example, this thread: https://github.com/Kupoman/blendergltf/issues/59

This applies to BabylonJS (at least the version we are using) and Three.js, which seem to expect the images or UVs "upside down": https://threejs.org/examples/webgl_loader_gltf2.html https://threejs.org/examples/js/loaders/GLTF2Loader.js (search for "_texture.flipY = false;")

So, given that those two engines already do it consistently, it could also be that I am getting this wrong? Or are the runtimes indeed missing to flip the texture properly?

Thanks a lot in advance!

@bghgary Can I ask you again about your comments on this?

donmccurdy commented 7 years ago

This was a recent correction to the spec, and I think our implementations and sample models haven't caught up yet.

See also:

mlimper commented 7 years ago

Oh I see, cool - thanks for the quick clarification.

Regarding the spec, the term "First image pixel (UV coordinates origin)" still feels a bit confusing to me - compare the MeshLab screenshot

javagl commented 7 years ago

There should indeed be a test case for this (cf. https://github.com/KhronosGroup/glTF-Sample-Models/issues/82 ). And I also wonder whether the part about the "UV coordinates origin" helps to clarify this. I always thought that the "first image pixel" was the "upper left" one, and that this would rather correspond to texture coordinates (0,1) (and not to the "origin", (0,0)). But this may be a misunderstanding.

mlimper commented 7 years ago

I always thought that the "first image pixel" was the "upper left" one, and that this would rather correspond to texture coordinates (0,1) (and not to the "origin", (0,0)). But this may be a misunderstanding.

Yes, that's what I thought

If it helps and everyone thinks it's a good idea, I can open a PR to adjust the wording, if we find something better. Maybe:

The first image pixel corresponds to the upper left corner of the image. The origin of the UV coordinates (0,0) corresponds to the lower left corner of the image. Implementation Note: OpenGL-based implementations must flip Y axis to achieve correct texture sampling.

@lexaknyazev Any comments on this?

emackey commented 7 years ago

This was a recent correction to the spec, and I think our implementations and sample models haven't caught up yet.

My understanding is that this was a "correction" to the spec when it was realized that all existing samples and all existing implementations were still doing it the glTF 1.0 way. Rather than re-do every sample and file patches against every implementation, it was easier to flip the spec back to what it had been in glTF 1.0.

emackey commented 7 years ago

Also, when this first happened, I made a test model (unpublished) that only used one corner of the texture. I made the model in Blender, which uses lower-left as the origin, and exported to COLLADA, which also uses lower-left as the origin. In both cases the UV coordinates ranged from 0.0 to 0.3. But when I ran the model through the most recent 2.0 branch of the COLLADA2GLTF converter, I got a glTF 2.0 file were the min/max accessors indicated 0.0 to 0.3 only for U, and 0.7 to 1.0 for V. So the conversion to glTF did flip the V coordinates, and BabylonJS and ThreeJS handled the converted model correctly (displaying what the browser considers to be the lower-left corner of the image for the 0.7 to 1.0 V range).

mlimper commented 7 years ago

I made the model in Blender, which uses lower-left as the origin, and exported to COLLADA, which also uses lower-left as the origin.

Actually, if this is really supposed to be different in glTF 2.0 from how most of the tools do it, I don't get the point why exactly one would want it to be this way... Sure, glTF is supposed to be a delivery format, not one for authoring tools. But then, especially with glTF 2.0, the Web is not the only target platform, so following a specific browser convention might not be the best way to go (in case that was really the reason). Also thinking about the non-Web and non-GL runtimes that start to use glTF right now.

emackey commented 7 years ago

I suspect the only intent at this point was to bring the written spec into alignment with code and samples that have already been published. It's too late to make such a breaking change to the existing implementations that have been released, so, the spec was simply altered to match released behavior. The original intent was different, I'm sure.

The current behavior follows WebGL's strange rule of putting the origin in the upper-left. Most other systems, including OpenGL, place it in the lower-left. The original intent was to put glTF 2.0's texture origin in the lower-left, but the existing samples and implementations didn't make that a reality.

mlimper commented 7 years ago

The current behavior follows WebGL's strange rule of putting the origin in the upper-left. Most other systems, including OpenGL, place it in the lower-left. The original intent was to put glTF 2.0's texture origin in the lower-left, but the existing samples and implementations didn't make that a reality.

Oh, I see - thanks for the explanation. Would of course be cool if it could still be changed to match what most of the tools do - but I also understand that a breaking change like this will be considered impossible because it will be too late at some point.

In any case, it would be nice to have a more detailed explanation in the spec, including an example such as the one proposed by @javagl. The hint about OpenGL-based implementations needing to flip the image actually applies to most existing OpenGL-based applications that don't use glTF, where you actually flip the image data (origin typically upper left) to match OpenGL's coordinate system (origin lower left) - therefore, the hint about OpenGL applications having to flip the image data still confuses me. Let's just add some more details and a minimalistic example to the spec.

emackey commented 7 years ago

Rather than flip image data, can you flip V coordinates during load or in the vertex shader, for systems that require it?

mlimper commented 7 years ago

Sure, why not? It's possible to do the flip inside the shader.

javagl commented 7 years ago

I'm not deeply familiar with some intrinsics of the related OpenGL functions (I tried to get a grip on that, but ... it's difficult). But therefore, I have to ask ( @emackey and others) whether flipping the texture coordinates works generically. All the "flips" have twisted my brain. So I wonder whether

flipping the texture and
flipping the texture coordinates

are not two entirely different things when something like glTexSubImage2D is involved.

Imagine a loader loads the texture from https://github.com/KhronosGroup/glTF-Sample-Models/issues/82 (which you made a valid point about!). And he calls glTexSubImage to extract the lower left (with the (0,0) in it). If the texture was flipped (at load time, as recommended in the current spec wording), then the same call would extract the upper left part - wouldn't it?

(Again, sorry if the question is naive or does not make sense. It might be rephrased or boiled down to the question of whether the flipping is done on load time, or at some later point - which isn't even clear for me for the UNPACK_FLIP_Y_WEBGL flag...)

lexaknyazev commented 7 years ago

The current behavior follows WebGL's strange rule of putting the origin in the upper-left. Most other systems, including OpenGL, place it in the lower-left. The original intent was to put glTF 2.0's texture origin in the lower-left, but the existing samples and implementations didn't make that a reality.

At the moment, only desktop and embedded OpenGL use lower-left origin. All other major systems (VK/D3D/Metal) use upper-left one.

emackey commented 7 years ago

All other major systems (VK/D3D/Metal) use upper-left one.

Ah, I didn't know that, thanks! So it's not so unreasonable to stick to the upper-left corner as origin.

Does anyone know the answer to the question about glTexSubImage2D above?

mlimper commented 7 years ago

glTexSubImage2D is used to write pixels, if you want to read them you will use readPixels. Looks like both are affected by the pixel storage (where you can set the UNPACK_FLIP_Y_WEBGL flag). From the MDN reference:

Setting the pixel storage mode affects the WebGLRenderingContext.readPixels() operations, as well as unpacking of textures with the WebGLRenderingContext.texImage2D() and WebGLRenderingContext.texSubImage2D() methods.

mlimper commented 7 years ago

Just to come back to the original point I wanted to make:

@lexaknyazev

At the moment, only desktop and embedded OpenGL use lower-left origin. All other major systems (VK/D3D/Metal) use upper-left one.

What confuses me here is that the glTF 2.0 spec says that "OpenGL-based implementations must flip Y axis to achieve correct texture sampling". If the first image pixel is upper left, and if the UV origin should be upper left as well, then the OpenGL implementation does not have to flip - or am I getting this wrong? The first pixel in the texture data array will be the UV origin.

lexaknyazev commented 7 years ago

@mlimper I agree that current language is confusing and should be rephrased. For most cases, engines don't need to flip anything even with OpenGL backend. However, when glTF asset contains only textures or only vertex attributes, engine has to know image orientation.

mlimper commented 7 years ago

Issue https://github.com/KhronosGroup/glTF-Sample-Models/issues/82 is closed and the discussion seemed to converge against rephrasing the spec accordingly (and maybe adding also an example picture).

Two questions:

Should this one here be kept open until we did a PR for a spec update?
Should I prepare a PR for a spec update and then close this one?

pjcozzi commented 7 years ago

Should this one here be kept open until we did a PR for a spec update?

I think so.

Should I prepare a PR for a spec update and then close this one?

Yes, please!!!

zellski commented 7 years ago

I'm going to do that thing where you comment on an ancient, closed issue.

What I'm curious about is whether you good folks have any suggestions as to what's the least astonishing (in the negative sense) thing for a format converter whose assumptions on input are that (0, 0) is lower left (FBX in my case), and which outputs glTF, with (0, 0) in the upper left:

Transform all texcoords by (u, v) -> (u, 1.0 - v)
Flip the literal images during conversion

I could try to write up a list of pros and cons, and try to provide context for the question. But let's imagine that this is a general purpose tool, and we know nothing about the people using it or the context within which it's used. What will someone who's not yet delved deep into this issue find more or less confusing -- that the images changed, or that the coordinate system changed?

For what it's worth, I believe COLLADA2GLTF flips V without even asking the user or providing an option to prevent the transformation.

lexaknyazev commented 7 years ago

@zellski I would say that in general flipping images isn't a good idea:

If images are stored in external files, users may not expect any changes in them.
Users might want to edit images after model conversion.
Flipping images itself could be unwanted:
- There're 3 options for JPEG:
  1. Decode, flip, and re-encode an image. Will reduce quality.
  2. Losslessly "transcode". This may be impossible for some images.
  3. Set EXIF Orientation metadata. Not all clients respect that flag, especially regarding flipping modes.
- There's only one option for PNG: decode, flip, and re-encode an image. There's a chance that generic converter will reduce efficiency of original PNG encoding (which may have been optimized for file size).

emackey commented 7 years ago

Agreed with @lexaknyazev. Converting v = 1.0 - v seems least destructive, and continues to work for mirrored/clamped coordinates.

zellski commented 7 years ago

I agree it's much better to leave images alone whenever possible. That said, in the PBR world we can't ever be entirely hands-off; e.g. roughness and metalness are going to be distinct textures in the source, and need to be merged into separate channels in the output. In pbrSpecularGlossiness there is likewise at least one new construction that needs to happen. I guess as long as synthesised textures is the exception rather than the rule, it's still worthwhile to try to pass everything else through untouched. But it kind of feels like the future is one where a converter needs to be ready to get its hands dirty with texture data, and maybe then it's less important to care about image flipping.

That said, I still agree with you. I guess I partly just wanted confirmation from someone more knowledgeable that it's not a horrific crime to mess with the UV mapping. Thank you!

KhronosGroup / glTF

Clarification: Texture Image Orientation #1021