Specs should clarify C strings and VK_MAX_EXTENSION_NAME_SIZE (and the sort)

KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools

Other

2.7k stars 452 forks source link

Specs should clarify C strings and VK_MAX_EXTENSION_NAME_SIZE (and the sort) #2351

Closed darksylinc closed 1 month ago

darksylinc commented 2 months ago

I've seen the following code from Google:

ext_name = malloc( VK_MAX_EXTENSION_NAME_SIZE + 1 );

I was under the impression that the null terminator goes in ext_name[VK_MAX_EXTENSION_NAME_SIZE - 1] rather than ext_name[VK_MAX_EXTENSION_NAME_SIZE].

I tried looking into the spec but I couldn't find how C strings are handled. Obviously Google's approach may be prudent and doesn't hurt, even if the null terminator is in ext_name[VK_MAX_EXTENSION_NAME_SIZE - 1].

I think it would be wise to clarify whether the null terminator goes in ext_name[VK_MAX_EXTENSION_NAME_SIZE - 1] or ext_name[VK_MAX_EXTENSION_NAME_SIZE]; specially to avoid surprises in the future.

oddhack commented 2 months ago

The existing language for structure members says they are char arrays of this size and contain a null-terminated string. If that's not clear, perhaps a NOTE somewhere in the fundamentals chapter? There are a couple of different max lengths of this nature. N.b. the null terminator goes at the end of the string which typically will be sooner than the end of the array since there are no extension names anywhere near that long (and ideally, never will be).

darksylinc commented 2 months ago

I too thought "it was obvious", but I think it's worth being more explicit (and giving a few examples) in this particular case, considering that bugs surrounding null terminators are the source of lots of CVEs.

oddhack commented 2 months ago

@darksylinc #2365 should address this - the Vulkan WG has signed off but it would be good for you to review as well.

darksylinc commented 2 months ago

Hi!

Looks good to me!

The only remark I can make is that it mentions about UTF8 strings but makes no mention if it is normalized or not. It's probably unnecessary and should not be clarified until someone actually raises this point with an actual problem being faced.

Besides, most strings are in the ASCII range in practice (notably exceptions are the marketing names in device name).

darksylinc commented 2 months ago

OK I just read something that left me thinking.

Perhaps the spec should recommend, but not require, that utf8 strings should be canonized aka normalized, to be in harmony with other public specs which do the same.

oddhack commented 2 months ago

Is there any actual drawback in practice other than not being able to do a byte-for-byte comparison of normalized and nonnormalized strings that are equivalent?

darksylinc commented 2 months ago

Usually that's the reason: byte-for-byte comparisons. This is a weak argument though, because nothing stops a vendor from writing 【My GPU Name】 instead [My GPU Name]. Thus if the app is looking for the latter, string comparison will fail even when both strings are normalized.

However beyond that, UTF8 parsing can be difficult and there have been security vulnerabilities and crashes in the past due to bugs when parsing invalid or non-canonical utf8.

Certain routines are optimized for iteration if you can assume they are in their canonical form and is UB if the input string is not in that form (however that would mean the spec must require canonical forms, and include validation tests for every string field to ensure they are all normalized). However it's bad practice to use such routines on strings you don't control (like an external API such as Vulkan).

I'm only saying this to point out potential edge cases. Personally it's probably best not to worry too much about it unless someone encounters a specific problem.

oddhack commented 1 month ago

I'm only saying this to point out potential edge cases. Personally it's probably best not to worry too much about it unless someone encounters a specific problem.

Agreed. Going ahead with #2365 as it stands.