ValeevGroup / tiledarray

A massively-parallel, block-sparse tensor framework written in C++
GNU General Public License v3.0
247 stars 51 forks source link

stream assignment to device tasks should be sticky #420

Closed evaleev closed 9 months ago

evaleev commented 10 months ago

currently device tile ops assign the device stream by the result tile's range ordinal ... for tasks that fuse multiple ops (e.g., scale + permute) this is not appropriate since the constituent ops may end up launching kernels into different streams, thus potentially violating the sequencing of the ops. The solution is to use the ordinal-based stream assignment only if a stream has not already been assigned.