Here are investigated a few properties of the current GPGPU programming frameworks, and the landscape in general.
The original plan was to see if there is any framework:
The outcome is that there is no such a framework, but many options each one with some limitations.
In particular, there are many good and interoperable Python abstractions:
And a couple of standards for tensor exchange between frameworks (Numba's
__cuda_array_interface__
and DLPack).
For more details see the dedicated note.