WSI 33.9 needs more rigorous / distinct terms and definitions

I apologize that this is a long, detailed issue. However, I believe it has been a source of confusion for many people besides myself for several years, and it all revolves around a central issue, being that WSI 33.9 conflates distinct terms by using them interchangeably. I may very well just be misunderstanding something, so please correct me if you believe this to be the case.

I started by posting this post on Reddit yesterday. Feel free to read it in case I leave out any information.

I'll start with my understanding, mostly affirmed by the responses to this previous issue. Please correct me if any of this is wrong:

I believe vkAcquireNextImageKHR does not immediately, meaning prior to its return, "acquire" an image from the swapchain, in the sense that it does not provide the application immediate ownership of it. Rather, when it returns, it is only to say that the application can be certain that, eventually, the image with the given index will be released by the PE from the swapchain. As a result, once vkAcquireNextImageKHR returns, it is safe to begin recording and submitting e.g. a render pass for it, so long as those rendering commands are synchronized with the actual acquisition of the image via the signal semaphore provided to vkAcquireNextImageKHR.

Secondly, I believe that it is fully safe and valid to repeatedly (an arbitrarily large number of times) alternate between a vkAcquireNextImageKHR call and a corresponding vkQueuePresentKHR call (assuming there is something valid in the middle, like a render pass), regardless of the number of swapchain images that are available. In other words, once a swapchain image's eventual (future) acquisition has been ascertained by the application via vkAcquireNextImageKHR's return, it is safe to enqueue render commands and a presentation, so long as those commands and the presentation are synchronized via the necessary semaphores, and then immediately make another call to vkAcquireNextImageKHR, regardless of how many swapchain images are available. To explain it one more way (and I do believe these are all equivalent statements), I believe it is fully possible and valid to have more frames in-flight than available swapchain images. After all, so long as the previous images have been enqueued for presentation, one can be certain that they will eventually be presented, and when they are, other images will be released (to make room for the new presentations). Therefore, in such a case, vkAcquireNextImageKHR should be guaranteed to succeed, as it should know that the image corresponding to the returned index will eventually be made available for rendering.

Lastly, I believe that it is not valid to repeatedly (an arbitrarily large number of times) call vkAcquireNextImageKHR without making corresponding calls to vkQueuePresentKHR. Otherwise, you can enqueue yourself into a situation where you're trying to acquire an image from the PE when none will ever be available, since none of them were ever enqueued for presentation.

Assuming my understanding is correct, I'll move on to my confusion.

The documentation on vkAcquireNextImageKHR states the following:

An image will eventually be acquired if the number of images that the application has currently acquired (but not yet presented) is less than or equal to the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount. If the number of currently acquired images is greater than this, vkAcquireNextImageKHR should not be called; if it is, timeout must not be UINT64_MAX.

I interpret this as saying that each swapchain image must be "presented" before its index can ever be returned again by vkAcquireNextImageKHR, so if you acquire too many images before they can be "presented", then vkAcquireNextImageKHR can fail since you will have run out of images to acquire. The real problem, though, is that I interpret "presented" to mean, in some way, "presented by the presentation engine." I don't even believe there's a simple way to know when an image has been presented by the presentation engine (vkQueuePresentKHR does not accept a signal fence or semaphore), so how can you possibly know when a call to vkAcquireNextImageKHR is valid or not, assuming that its validity depends on how many images have been acquired but not yet presented?

I believe my confusion is caused by a miscommunication in the docs. Particularly, I believe that when the docs say, "acquired (but not yet presented)", it actually means "acquired their index via a call to vkAcquireNextImageKHR (but not yet enqueued for presentation via a call to vkQueuePresentKHR)". In fact, the sentence clearly states that the application is the one doing the "presenting," yet the application can't present an image; that is not the application's role. All it can do is request that the PE eventually present the image.

Assuming I'm not missing anything, I think a general solution would be to introduce more rigorous / distinct terms and definitions in WSI 33.9.

For instance, it's probably confusing to say vkAcquireNextImageKHR "acquires" the image, or returns the index of the "acquired" image. Acquisition typically implies ownership, yet the docs clearly explain that the application doesn't actually have ownership of the image until the semaphore is signaled. In the earlier issue I mentioned, Ian Elliott stated that vkAcquireNextImageKHR waits for the image to become "available," and its semaphore is signaled when the image is "owned by the app." Perhaps this is better terminology, though surely there is an even better alternative. Is the image really "available" if the application can't actually start drawing to it yet without waiting on a semaphore? Probably not.

Another example: WSI 33.9 clearly states at one point, "Use of a presentable image must occur only after the image is returned by vkAcquireNextImageKHR, and before it is presented by vkQueuePresentKHR". But vkQueuePresentKHR doesn't present an image. As mentioned, the application can't present an image in any way; it can only request that the PE eventually present the image. Again, perhaps there should be a term for "enqueue for presentation" which is clearly distinct from "present", because, unless I'm gravely misunderstanding something, WSI 33.9 seems to use the two terms interchangeably despite them meaning very different things. Perhaps I should be inferring that when it uses "present" from the application's perspective, it means "enqueue for presentation," and when it says "present" from the PE's perspective, it actually means "present". But such an inference is not obvious, and it's entirely impossible when the term "presented" is used in passive voice without any indication of the subject.

Again, if my understanding is wrong, please feel free to correct me and close the issue :)

I believe vkAcquireNextImageKHR does not immediately, meaning prior to its return, "acquire" an image from the swapchain, in the sense that it does not provide the application immediate ownership of it.

It does immediately "acquire" the image. And vkQueuePresent immediately "releases" it. It is important later on for accounting of how many images you can acquire.

The swapchain owns the swapchain images (in the conventional sense that it is the one that manages the memory allocation). And that does not change. You are only borrowing them. Only thing this can be confused with is queue family ownership, which is a different kind of ownership.

Secondly, I believe that it is fully safe and valid to repeatedly (an arbitrarily large number of times) alternate between a vkAcquireNextImageKHR call and a corresponding vkQueuePresentKHR call (assuming there is something valid in the middle, like a render pass), regardless of the number of swapchain images that are available.

Yes, given acquire and release are immediate as stated above, this means you only ever have zero or one image acquired at any given time. I think there doesn't even have to be anything in the middle, but drivers are bound to be buggy as it is unusual to use it that way.

Lastly, I believe that it is not valid to repeatedly (an arbitrarily large number of times) call vkAcquireNextImageKHR without making corresponding calls to vkQueuePresentKHR.

In fact, you could keep acquiring as many images as you want, as long as you do not supply infinite timeout and are prepared that the acquisition may fail\timeout.

All this fuss is just about preventing a deadlock, nothing more.

An image will eventually be acquired if the number of images that the application has currently acquired (but not yet presented) is less than or equal to the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount. If the number of currently acquired images is greater than this, vkAcquireNextImageKHR should not be called; if it is, timeout must not be UINT64_MAX.

It is basically a reiteration of the vkAcquireNextImage VU. It means, you must not cause a deadlock. Additionally you are recommended not to try to acquire more images than are acquirable.

By "presented" it means "released". The "released" terminology probably did not exist yet when that sentence was made, so specifically it means "when vkQueuePresent was called on the image". It is in parenthesis, so it is not really important. Author just seem to felt the need to point out that a presented image is no longer acquired.

It does immediatelly "acquire" the image. And vkQueuePresent immediatelly "releases" it. It is important later on for accounting of how many images you can acquire.

I suppose that does match my understanding of how vkAcquireNextImageKHR and vkQueuePresentKHR work. Thank you for the clarification.

By "presented" it means "released". The "released" terminology probably did not exist yet when that sentence was made, so specifically it means "when vkQueuePresent was called on the image". It is in parenthesis, so it is not really important. Author just seem to felt the need to point out that a presented image is no longer acquired.

This makes a lot of sense; thank you :-). That being said, whether the parenthetical information is important or not, it is still incorrect, and so I think it should either be removed or corrected. As you mentioned, an image does not have to be "presented" before it can be reacquired; it only has to be "released." Saying that it has to be "presented" before it can be reacquired implies that the CPU has to be directly synchronized with the PE in some way in order to avoid a deadlock on infinite timeouts, which simply isn't the case.

That goes for the other sentence in WSI 33.9 that I mentioned as well, where it states, "Use of a presentable image must occur only after the image is returned by vkAcquireNextImageKHR, and before it is presented by vkQueuePresentKHR." An image is not "presented" by vkQueuePresentKHR; it is queued for presentation and "released." I believe I noticed at least one other sentence where this happened again as well.

Basically, I believe that whenever the "released" terminology was introduced in WSI 33.9, it seems like a few of the "present" terms which should have been replaced with "released" were overlooked. It seems like an easy fix, and I think it would clear up a lot of confusion around the semantics of swapchain image acquisition and presentation given that "present" and "release" are already clearly distinguished within the documentation. Is this worth doing?

As you mentioned, an image does not have to be "presented" before it can be reacquired; it only has to be "released."

No, "presented" here is a shortcut for saying "calling the vkQueuePresentKHR command". And calling the vkQueuePresentKHR "releases the acquisition" of the image. So "presented" and "realeased" means the same thing here.

I guess the problem is "present" means multiple things. It means "the application calling the vkQueuePresentKHR command". And it also means the swapchain displaying the image to the surface. It is used many times in the spec, so it would be tricky to fix, and frankly the distinction feels clear to me in all the cases I came about.

Anyway, I leave it to you and Khronos people to decide if and what needs to be tweaked.

I agree that the distinction seems clear given that you already understand what the sentence is trying to communicate. But in my opinion, if you don't understand what the sentence is trying to communicate, it can be very difficult to decipher exactly which meaning of "present" it is referring to. I almost imported an OpenXR package so that I could sync my CPU with the PE until I realized that I was just misunderstanding the docs.

Given that we already have the perfect word for replacing one of these two use cases, being "release", it seems like there's no reason for "present" to have two different meanings anymore. From my understanding, that was the sole reason of introducing the whole "release" terminology to begin with. Though it is unfortunate that it would be such a difficult thing to fix.

I suppose I'd be willing to comb through the docs in my spare time and make a PR myself (it'd take awhile), but only if a) people other than myself find it confusing, and b) someone in Khronos is legitimately willing to review the PR. Reviewing it would take nearly as long as writing it since it would require verifying all proposed substitutions of "present" with "release". I'll leave it up to the Khronos folks to decide if this is even worth it.

If you have issue only with that one paragraph, then I guess it is an easy fix. I seem to fail condition a), so I recommend you PR only the parts that trip your understanding.

BTW, I am not really a fan of the whole paragraph you quoted. If we trully dissect it, it has more problems than that. The only added information is the "should" sentence (while the spec does not even say why it gives that strong recommendation), and rest is fluff, repetitions, or subtle lies.

That's fair. I'm not entirely certain that I'm capable of rewriting the paragraph at this time. I could probably remove the repetitions, fluff, and subtle lies, but the best I could do about the "should not" and "must not" is replace it with something about "undefined behavior" or "implementation-dependent" (I've read that some implementations will just return an INVALID_BEHAVIOR in such a case, by which I assume they mean VK_ERROR_UNKNOWN, whereas others will just block / return a VK_TIMEOUT). Though given that this issue was originally intended to clarify the terminology in the whole WSI 33.9 section, it seems silly to consider rewriting this paragraph a separate issue from the one that I originally brought up.

So I'll try to educate myself a bit more and then propose a rewrite of the paragraph via a PR. I'll also make a separate PR to propose replacing the word "present" with "release" wherever I feel it might be clarifying, but I'll restrict myself to WSI 33.9, given that it was the original scope of this issue. If I have time, I'll comb through the docs and see if the same ambiguity exists in other sections (though WSI 33.9 seems to be the primary culprit), and if it does, perhaps I'll make another PR for those cases as well, but that is technically unrelated to this issue.

I have a proposed rewrite for the paragraph regarding the potential deadlocking behavior of vkAcquireNextImageKHR, but I figured it'd be useful to discuss it here before making a PR to see if there are any suggestions from the general community. For reference, here is the paragraph as it is currently written in the docs:

An image will eventually be acquired if the number of images that the application has currently acquired (but not yet presented) is less than or equal to the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount. If the number of currently acquired images is greater than this, vkAcquireNextImageKHR should not be called; if it is, timeout must not be UINT64_MAX.

Here is the proposed rewrite:

If the number of images that the application currently has under acquisition is greater than the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount, vkAcquireNextImageKHR should not be called; if it is, timeout must not be UINT64_MAX.

Justification:

"An image will eventually be acquired...", as stated in the docs currently, is not correct. More accurately, an image will either eventually be acquired or the function will time out if a non-infinite timeout is provided. Just because there are available (unacquired) swapchain images, that does not mean one can necessarily be acquired immediately. However, it would be redundant to say, "An image will eventually be acquired or the function will time out," because these are the only two defined behaviors of the function anyways. Therefore, I propose removing this part entirely and leaving only the part explaining when the behavior of the function is not defined.
"(but not yet presented)", as explained earlier in this issue, is ambiguous due to the fact that there are two different definitions of the term "present" as used in the docs, and it is difficult to infer which definition this is referring to. For reasons such as this, the term "released" was recently introduced to replace this particular use case for the term "present". One way to fix this, then, is to change this to "(but not yet released)". However, a "released" image is inherently no longer under the acquisition of the application, so the parenthetical can be implied anyways, and therefore it is not necessary. I propose the description "currently has under acquisition" rather than "has currently acquired" because "acquired" can be interpreted as a descriptor (i.e. currently under acquisition) or as the past-tense verb (i.e. was at one point under acquisition, and may or may not still be). The former interpretation implies the parenthetical whereas the latter does not. In order to imply and therefore remove the parenthetical, the former interpretation of "acquired" must be used, but in order to disambiguate between the two interpretations, the interpretation itself can be substituted for the word, leaving us with "currently has under acquisition." I believe this is much less likely to be misinterpreted.

I am open to all suggestions and criticisms. I am reluctant about the "must not" part. I am not entirely convinced that "must not" is the correct terminology here at all, according to RFC 2119. I know for certain that there are implementations which are guaranteed to never block on vkAcquireNextImageKHR (and therefore such a call can never result in any breaking behavior), and that is in no way non-conforming to the spec (even if unintended). Perhaps the "must not" portion should simply be written as "should not" as well (or omitted entirely). That being said, such a modification would alter the intended semantics of the spec (not just the readability, which is the focus of this issue), which may then require further changes to the validation layers as well (I'm not sure how they respond to infinite-timeout calls under such conditions). So that's probably best left for another issue.

If there are no criticisms or objections in the next couple of days, I'll make a PR.

has under acquisition

Weird hyperformal wording. Should be just "acquired", or the sentence restructured so English need not be tortured so much.

of images in swapchain

grammar: should probably be "a\the swapchain"

Lot of the spec is written with "german sentence structure" (i.e. with the main point of a sentence all the way at the end of a very long sentence). But maybe could simplify our paragraph here into a short simple direct sentence, e.g.:

An application should: not attempt to have more swapchain images {being} acquired than swapchain image count minus VkSurfaceCapabilitiesKHR::minImageCount plus 1.

Agree with justification 1. Only VK_SUCCESS or VK_SUBOPTIMAL_KHR lead to image acquisition, but likely the command is allowed to fail.

However, a "released" image is inherently no longer under the acquisition of the application, so the parenthetical can be implied anyways, and therefore it is not necessary.

That probably is what the original author intended to clarify. Perverted interpretation of "images that the application has currently acquired" could be "every time the app called vkAcquire irrespective of release". Which is kinda his own fault to need to do that, and he could have just reworded in a way that that it would be clear "acquired" here is a state of an image, and not the action performed by the app.

Anyway, no part is best part. And no words have no ambiguity. Wholeheartedly agree with removal of the whole sentence since I am not clear what added value\information is it trying to provide.

I am reluctant about the "must not" part.

I would remove it too. It is just a repetition of the VUID-vkAcquireNextImageKHR-swapchain-01802. And since the timeout is not even otherwisely the topic of the paragraph, I see little reason to bring it up.

I am not entirely convinced that "must not" is the correct terminology here at all, according to RFC 2119. [...] I know for certain that there are implementations which are guaranteed to never block on vkAcquireNextImageKHR.

Pretty sure in here it is how the word is supposed to be used. You simply must not violate the rule, otherwisely the driver is allowed to behave in whichever way it feels like.

I think you misunderstand what the restriction is about. If the swapchain has no images left, then obviously it cannot give you one (and has to block or otherwisely fail). If you use the infinite UINT64_MAX timeout, then you have a deadlock (waiting forever for an image that does not exist; well, at least until the OS kills your app and\or the driver).

grammar: should probably be "a\the swapchain"

I apologize; it is referring to a particular variable argument whose name is "swapchain", not the actual swapchain itself. The current writing of the paragraph does the same thing. This is probably for the best as there is nothing preventing an application from creating and maintaining multiple swapchains simultaneously, and the most sensible way to refer to the swapchain in question is through the variable name. In the actual PR, the font will of course be specialized to fit the spec's usage of variable name references.

Pretty sure in here it is how the word is supposed to be used. You simply must not violate the rule, otherwisely the driver is allowed to behave in whichever way it feels like.

That's fair. I've heard that some implementations can theoretically copy out and provide you with an internally-used swapchain image once you've run out of all other available images. This would suggest a case where it would be both valid and useful to call vkAcquireNextImageKHR even with an infinite timeout when you've acquired all the "available" images; however, it requires knowing about the underlying implementation, and if you get it wrong, you will either end up causing an error or a deadlock. I'm not sure if needing to know the underlying implementation in order to know whether or not the API call is valid makes "should not" or "must not" more appropriate, so I've been on the fence. But if it is just a repetition anyways, then I suppose it's a moot point.

Weird hyperformal wording. Should be just "acquired", or the sentence restructured so English need not be tortured so much. [...] Lot of the spec is written with "german sentence structure" (i.e. with the main point of a sentence all the way at the end of a very long sentence). But maybe could simplify our paragraph here into a short simple direct sentence

Noted. Perhaps that leaves us with:

vkAcquireNextImageKHR should not be called if the number of images that the application has currently acquired is greater than the difference between the number of images in swapchain and the value of VkSurfaceCapabilitiesKHR::minImageCount.

Any further revisions?

Very interesting issue. In the initial comment, you bring up several good points, some of which have been responded to, some of which have not been acknowledged I believe. Paraphrasing:

The number of presentations in flight is not limited to the number of swapchain images

True in the general sense, though some operating system vendors refuse to allow this and artificially throttle vkQueuePresentKHR() to avoid it, making the ecosystem inconsistent despite my best efforts to remove any such throttling from Vulkan presentation (~all OpenGL implementations perform this throttling, because many applications were naively written and behave poorly without it, then the driver throttling behavior reinforced this naive implementation as the correct one, making the throttling all the more necessary, and so on). Further, as a side effect of some implementations, there may be no "present completed" synchronization primitive available until the presentation engine has begun presenting a given image, and as a result, the rate at which vkAcquireNextImageKHR returns will throttle presentation, especially when presentation operations happen only once per vertical refresh (E.g., FIFO mode), limiting the number of outstanding presents to roughly the number of images in the swapchain in practice. However, applications must not rely on such behavior, as it is not guaranteed.

The "Acquire" in "vkAcquireNextImageKHR" and its descrptions are misleading because the actual acquire occurs only once the returned synchronization primitives signal.

Unclear. KhrOoze has shared one interpretation. In practice, I think both interpretations (acquire happens at call return time, acquire happens when the sync primitives signal) have been used to concoct parts of the WSI design at different times, and I'm not clear the design is entirely self-consistent here at this point.

The spec is inconsistent when referring to "presention" Vs. "enqueued present operations"

Likely true, and probably victim to the same issues as above. I agree introducing the term "release" and using it consistently in place of "a call to vkQueuePresentKHR() has been made" is probably an improvement.

"Ownership" is unclear

Yes, at some point the term "ownership" was purposely scrubbed from the WSI language even though it naturally flows into one's mind in discussion of the operations here, and we'd consistently used that term to refer to which swapchain images were under application Vs. presentation engine control up to that point, both in internal discussions and in some spec language. This was understandable, because it conflicts with a separate concept of ownership used pervasively in the already-difficult-to-wrap-your-head-around synchronization (as in synchronization of "resources") section of the spec. While well intentioned, I think this ultimately made the WSI portion of the spec harder to read, and made it harder for those of us that worked on the original version to remember what we had intended to say in regards to these tricky-to-describe transitions of swapchain images between application, Vulkan, and presentation engine management. Internally, I have some slides derived from early WSI design discussions that built up all the WSI operations in terms of swapchain image ownership (again, an entirely distinct concept from the Vulkan spec's current concept of resource "ownership"), and I should probalby dig those out and distill them down to what was at least the original intended design.

When presentation has actually begun (as opposed to when it is enqueued) is not clear.

True, as is when presentation is actually complete, and we are working on resolving this to some extent by introducing new functionality, though progress has been slow.

My broader take-away is something I've had in mind for a long time: The WSI section needs an outright rewrite to properly capture the collective understanding of it among driver vendors and the WSI working group members, as well as bring it in line with the norms and conventions used throughout the rest of the specification. The current incarnation has understandably confounded and frustrated many a reader and application developer. I've been avoiding taking this on because, as noted, we have some substantial extensions to it in flight, and it's quite painful to develop extension language against an in-progress rewrite.

In the meantime, I think we would welcome targeted clarifications like the above in the form of pull requests. IMHO, PRs are a much better mechanism to finesse wording and gather feedback from the community than conversations on an issue, and it's much easier to avoid confusion between things like "swapchain" and "pname:swapchain" when looking at the spec in its raw-markup form ;-)

Unclear. KhrOoze has shared one interpretation. In practice, I think both interpretations (acquire happens at call return time, acquire happens when the sync primitives signal)

IIRC then where it matters, it is the first interpretation (though if there are some cases of the second, they sure should be excised). There are many places that disambiguate the interpretation, e.g.:

An application can acquire use of a presentable image with vkAcquireNextImageKHR. After acquiring a presentable image and before modifying it, the application must use a synchronization primitive to ensure that the presentation engine has finished reading from the image.

Or:

When successful, vkAcquireNextImageKHR acquires a presentable image from swapchain that an application can use [...] The presentation engine may not have finished reading from the image at the time it is acquired

KhronosGroup / Vulkan-Docs

WSI 33.9 needs more rigorous / distinct terms and definitions #1419