WebGPU: implement fences

Requires: #597

There is no general way to track GPU execution in WebGPU, but we need to support one special case: copying to a staging texture/buffer and mapping it for reading. Handling this special use case transparently for the client can be implemented as follows:

For each signaled value, a fence keeps a list of sync points. The sync point is a simple class derived from IObject that only holds one atomic bool value.

When a buffer or texture is mapped for reading, the async callback creates a sync point and adds it to the device context. Device context accumulates all pending sync points until the next flush.

When a fence is signaled, it is added to the list of pending fences in the context.

When flush happens, all sync points from device context are added to all pending fences. The context discards sync points that are signaled.

When async callback is called, it signals the sync point.

To get current value, the fence checks all sync points for the value, if they all are signaled, the value is updated.

⚠️ Async callback must keep a strong reference to the sync point so that it can be safely fired after device and all objects are released.

When the callback is called, staging buffer/texture should copy data to a temporary buffer to then allow client call

Other changes that need to be made here:

Remove busy waits in Buffer/Texture

    while (wgpuBufferGetMapState(m_wgpuBuffer.Get()) != WGPUBufferMapState_Unmapped)
        m_pDevice->DeviceTick();

Rework DeviceContext::WaitForIdle by issuing DeviceTick and waiting for all sync points to signal.
- WaitForIdle will be disabled on the Web

DiligentGraphics / DiligentCore

WebGPU: implement fences #596