Asynchronous semantic for operators and backend

This proposal introduces a fully asynchronous semantic for the backend and its operators.

For the Warp and Neon backends, the execution of an operator is asynchronous (the calling function returns before the kernel completes). However, this asynchronous behavior is not fully abstracted by the XLB backend. Currently, synchronization calls are added directly to the code using Warp or Neon mechanisms where needed.

This proposal aims to discuss how to implement a comprehensive synchronization semantic; whether it will be visible to the XLB user depends on the chosen approach.

So far, two cases have been considered:

Case A: Directly abstracting synchronization into the backend or operator API. In this approach, the synchronization abstraction should manage:
- A default stream
- A synchronization method
Case B: Enforcing synchronous behavior directly in any CPU operation that accesses XLB fields. This solution would require:
- A default stream management
- Injecting synchronization into any operation that allows the user to access field data

The proposal aims to address situations where CPU computation may access data still in use by the GPU. For instance, in this example, a buffer could be deleted before the kernel has completed.

In Case A, we would require an explicit XLB sync call, like xlb.sync(), at line 23. In Case B, the synchronization would be included directly in the field destructor.

Autodesk / XLB

Asynchronous semantic for operators and backend #87