KhronosGroup / Vulkan-Docs

The Vulkan API Specification and related tools
Other
2.78k stars 466 forks source link

[Specification] Request for clarification on timeline semaphore wait-before-signal behavior within the same queue #1262

Open benvanik opened 4 years ago

benvanik commented 4 years ago

We were having some discussions and realized we didn't know exactly what this meant in the context of same-queue wait-before-signal operations:

Section 5.6, Queue Forward Progress:

When using timeline semaphores, wait-before-signal behavior is well-defined and applications can submit work via vkQueueSubmit which defines a timeline semaphore wait operation before submitting a corresponding semaphore signal operation. For each timeline semaphore wait operation defined by a call to vkQueueSubmit, the application must ensure that a corresponding semaphore signal operation is executed before forward progress can be made.

What is the submission order and how is it respected if there are out of order submissions to the same queue? For example the first submit (or batch) has a wait-semaphore and execute of command buffer A and the second submit (or batch) has an execute of command buffer B and a signal-semaphore. Is the intent that this will deadlock, or will command buffer B execute and then A as if they had been submitted in reverse order?

the application must ensure that a corresponding semaphore signal operation is executed

The thing throwing us off may be here; does this mean that only a signal from either vkSignalSemaphore or a submission signal from any other queue/device/external source will work or does it include a submitted signal to the same queue?

Clarification as to what the well-defined behavior is and how it relates to single queues would be much appreciated!

cc @kangz @scotttodd

krOoze commented 4 years ago

A and the second submit (or batch) has an execute of command buffer B and a signal-semaphore. Is the intent that this will deadlock

For each timeline semaphore wait operation defined by a call to vkQueueSubmit, the application must ensure that a corresponding semaphore signal operation is executed before forward progress can be made.

Not sure what is unclear here. Failure to execute semaphore signal will result in deadlock (resp. failure to make forward progress).

or will command buffer B execute and then A as if they had been submitted in reverse order?

Why would you think that? That would violate half the specification.

If you submit a wait, none of the subsequent work can start. If you submit a semaphore signal, all the previous work has to finish. The semaphore cannot be signaled if the work is not allowed to be even started.

PS: The spec does not talk about vkQueueBindSparse though, which might be more interesting.

benvanik commented 4 years ago

Anything that requires a deep understanding of half of the spec to understand is unclear, IMO ;) Given the complexity of something like timeline semaphores there's value in being descriptive and precise.

What's unclear (or "under-described" if you have a particular meaning of the word "unclear" in your mind) is the usage of the word submit to mean two different things and the implied ordering under discussion of wait-before-signal:

applications can submit work via vkQueueSubmit which defines a timeline semaphore wait operation before submitting a corresponding semaphore signal operation.

A literal reading is that two submits with one which waits and one which signals is valid. But there's additional restrictions that are not explicitly called out: it is valid iff the queues/devices/etc differ and it is never valid if the queues are the same due to the implicit deadlock in this example. In contrast, the prior 5.6 paragraph on binary semaphores is very clear about its requirements with respect to ordering and on which queues the operations must take place.

A diff that would be clearer and match the wording of the binary semaphore paragraph could be:

the application must ensure that a corresponding semaphore signal operation is executed from the host with vkSignalSemaphore or a queue without a prior wait on the same semaphore before forward progress can be made.

Because there's no such requirement (that I can find) I suspect the last time I tried to do this out of order signaling it worked on the particular driver I was testing with and none of the validation layers complained. If the spec was clearer on this I would file an issue with the validation layers (or, since that was a ~year ago, go see if it's been added in the meantime - which is likely due to the awesomeness of the layers :).

gfxstrand commented 4 years ago

What is the submission order and how is it respected if there are out of order submissions to the same queue? For example the first submit (or batch) has a wait-semaphore and execute of command buffer A and the second submit (or batch) has an execute of command buffer B and a signal-semaphore. Is the intent that this will deadlock, or will command buffer B execute and then A as if they had been submitted in reverse order?

Yes, that deadlocks. The easy explanation of why is found in the colloquial meaning of the word "queue". Things on a queue generally get executed in roughly the order in which they are submitted. In particular, the driver should not be required to re-order stuff within a queue in order to solve application-provided dependencies. I say "generally" and "roughly" because that's obviously not exactly what happens. More precisely, things are started in the order they may complete out-of-order due to GPU parallelism. For an even more precise statement about queue implicit ordering, see the section of the Vulkan spec entitled "Implicit Synchronization Guarantees".

More generally, however, I think the issue isn't so much what the Vulkan spec guarantees in the case you wrote above so much as that it doesn't guarantee it succeeds. We're not guaranteed to deadlock and someone could, in theory, have an implementation which is able to re-order stuff within a queue (though I personally find that unlikely). However, there's nothing in the Vulkan spec which guarantees that the driver re-orders to make your case valid.

TomOlson commented 4 years ago

@jekstrand has it right. We should also note, synchronization validation is a high priority for the validation team at LunarG. It's a hard problem and won't be solved all at once - this particular case probably won't be caught in the first phase, but it'll come eventually.

mbechard commented 3 years ago

I'll mention that I was also wondering the same thing as @benvanik. With the "start in-order, complete out-of-order" behavior of a queue, I was also wondering if it could behave as:

  1. Submit A
  2. A starts, checks wait semaphore and goes to sleep
  3. Submit B
  4. B starts, completes work and signals semaphore
  5. A wakes up

I don't think it hurts to be explicit in section 6.6, since it does state "behavior is well-defined and applications can submit work via vkQueueSubmit which defines a timeline semaphore wait operation before submitting a corresponding semaphore signal operation.". I understand this sentence is caveated by rules stated in other sections, but it seems worthwhile to make it clear here that the above sentence is 'except for submissions within the same queue'.

hooyuser commented 2 years ago

I've also imagined things like

  1. Submit A
  2. A starts, checks wait semaphore and goes to sleep
  3. Submit B
  4. B starts, completes work and signals semaphore
  5. A wakes up

could happen. Not relying upon the discussion here, it isn't really explicit for me to realize that wait-before-signal behavior succeeds between different queues or between host and queues, but fails within a single queue.

artem-lunarg commented 4 months ago

@benvanik I can't provide a quote from the specification that explicitly addresses the above single queue scenario, but I'll try to show the possibility of the deadlock by using the proof by contradiction.

Lets assume that a signal after the wait on the same queue has a chance to happen. According to section 7.4.1, this signal defines a memory dependency and the first scope includes all previous commands:

Semaphore signal operations that are defined by vkQueueSubmit or vkQueueSubmit2 additionally include all commands that occur earlier in submission order.

It means that the first scope of the signal also includes the semaphore wait operation (it predates signal in the suggested scenario).

The second scope according to the specification:

The second synchronization scope includes only the semaphore signal operation.

Memory dependency is also an execution dependency and the latter guarantees that the first scope happens-before the second scope. In our case wait operation (the first scope) happens-before signal operation (the second scope) and that's a contradiction, because happens-before here means that wait actually finishes before signal. This proves that the original assumption is incorrect.

artem-lunarg commented 4 months ago

Another useful place is Section 3.2.1 Queue Operations, which is approximately the same idea as I wrote above:

Before a fence or semaphore is signaled, it is guaranteed that any previously submitted queue operations have completed execution, and that memory writes from those queue operations are available to future queue operations.

nunyabidnezz commented 2 months ago

Can vkQueueSubmit() block on the wait and not return?

From https://docs.vulkan.org/guide/latest/queues.html

How a VkQueue is mapped to the underlying hardware is implementation-defined. Some implementations will have multiple hardware queues and submitting work to multiple VkQueue​s will proceed independently and concurrently. Some implementations will do scheduling at a kernel driver level before submitting work to the hardware. There is no current way in Vulkan to expose the exact details how each VkQueue is mapped.

Does "Some implementations will do scheduling at a kernel driver level before submitting work to the hardware" mean that some implementations may stall?

In the above examples, single thread, if using multiple queues in a single thread to do the signal and wait on each separate queue. Would it be possible to assume that this would work globally? Or would it be required to place each queue operation on a separate thread?