KhronosGroup / glTF

glTF – Runtime 3D Asset Delivery
Other
7.19k stars 1.14k forks source link

Unclear when to percent encode and decode URI strings #2245

Open RichardTea opened 1 year ago

RichardTea commented 1 year ago

Section 2.8 URIs is unclear. It does not appear to be possible for an implementation to determine whether a URI has been percent-encoded or written as-is.

Reserved characters (as defined by RFC 3986, Section 2.2. and RFC 3987, Section 2.2.) MUST be percent-encoded.

":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Additionally:

Paths with non-ASCII characters MAY be written as-is, with JSON string escaping, or with percent-encoding; all these options are valid.

RFC 3986 separately states that "%" MUST be encoded as "%25", but it is unclear from the text whether or not this requirement also applies to glTF - "%" is an ASCII character, and is not an RFC 3986/3987 reserved character, so the text indicates that it MAY be written as-is.

If space and percent MAY be written as-is or percent-encoded, then all of the following are permitted: Case uri raw UTF-8 string file on disk Note
1 big ball.png big ball.png Clear. Nothing is percent encoded
2 big%20ball.png big ball.png Indistinguishable from 3
3 big%20ball.png big%20ball.png Indistinguishable from 2
4 big%ball.png big%ball.png Distinguishable as %ba is not valid UTF-8
5 big%25ball.png big%ball.png Indistinguishable from 6
6 big%25ball.png big%25ball.png Indistinguishable from 5

It is not possible for a reader to distinguish between cases 2 and 3, or between cases 5 and 6. A compliant reader must try both URIs (perhaps re-encoding for HTTPS)

I recommend two changes:

a) The wording of the first requirement for relative paths should be changed to:

Paths containing reserved characters (as defined by RFC 3986, Section 2.2. and RFC 3987, Section 2.2.) MUST be percent-encoded.

b) The glTF standard should explicitly require paths containing the "%" character be percent-encoded. This would prohibit cases 3, 4 and 6, and eliminate the confusion.

Sadly I am aware of implementations that export using all six methods - Blender's Khronos glTF Blender I/O v1.5.17 even does this in the same file!

scurest commented 1 year ago

Space and percent have to be percent-encoded.

RFC 3986 separately states that "%" MUST be encoded as "%25", but it is unclear from the text whether or not this requirement also applies to glTF

glTF URIs are normal URIs, all the normal rules apply.

"%" is an ASCII character, and is not an RFC 3986/3987 reserved character, so the text indicates that it MAY be written as-is

I don't see how it indicates that. The text says reserved characters have to be percent encoded and says non-ASCII characters may be written multiple ways, but neither applies to percent.

Sadly I am aware of implementations that export using all six methods - Blender's Khronos glTF Blender I/O v1.5.17 even does this in the same file!

If you can reproduce a wrongly encoded URI in the current version, you should report a bug.

javagl commented 1 year ago

Some strongly related discussion (with a table that shows possible ambiguities that is very similar to the one on the first post here) can be found around https://github.com/KhronosGroup/glTF/issues/1449#issuecomment-611560284 .

RichardTea commented 1 year ago

Space and percent have to be percent-encoded.

The trouble is that the published document does not actually say that.

I note that the linked issue it's explicitly noted that "space" can be either encoded or unencoded. https://github.com/KhronosGroup/glTF/issues/1449#issuecomment-422862486

I'm fine with a rule along the lines of "if nothing else needs to be percent-encoded then space may be left as-is", but any paths containing the "%" character absolutely must be encoded.

glTF URIs are normal URIs, all the normal rules apply.

I agree that the intent probably was that glTF URIs MUST be either URIs (RFC 3986) or IRIs (RFC 3987). Unfortunately, the text is ambiguous and a large number of implementations do not encode the % character.

If you can reproduce a wrongly encoded URI in the current version, you should report a bug.

Snippet from a Khronos glTF Blender I/O v3.4.50 export:

"buffers" : [
        {
            "byteLength" : 840,
            "uri" : "example 10%.bin"
        }
    ]