gt4py v1 removes the Storage class and allow any __array_interface__ describing object to be bound. Unfortunately, the default cupy allocation used in our model has a bad stride (should have unit stride) leading to performance decrease in the backend.
Potential solution:
use gt4py provided allocator (and optimized for the backend)
gt4py
v1 removes theStorage
class and allow any__array_interface__
describing object to be bound. Unfortunately, the defaultcupy
allocation used in our model has a bad stride (should have unit stride) leading to performance decrease in the backend.Potential solution: