Open ZelboK opened 3 months ago
You can use UniversalCopy
as a means of always dispatching to the SIMT load/store rather than cp async copies. A deeper refactor of this automatic dispatch is planned but won't land anytime soon.
If you would like to work on a PR, that would be lovely! @ccecka CC
I suspect much of this can be accomplished by just renaming UniversalCopy
to be the new DefaultCopy
so that it never goes through automatic LDGSTS route again
Sure, I'll push up a PR today. I'll see if just replacing the defaultcopy to be universalcopy is enough.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Kind of a small change. So I was looking at https://github.com/NVIDIA/cutlass/issues/1231
and I was wondering if it made sense to refactor the code so that it will accept the type of copy they want to rely on like so
copy(gmem_tiled_copy, SM80_CP_ASYNC_CACHEALWAYS<float>{}, tAgA, tAsA)
so that way you can have the user be aware of what's going on internally. Right now if you useDefaultCopy
for example it'll dispatch to that instruction which requires the code to have more synchronization which is a little unintuitive.You might be able to get the copy type inferred from
gmem_tiled_copy
but passing it as an explicit parameter is probably much easier.I'm wondering if there's any flaws/problems with doing this? Perhaps the default should always be
copy(gmem_tiled_copy, DefaultCopy{}, tAgA, tAsA)
.I could work on a PR for implementing this if it's beneficial. Would like to know ahead of time if this is even desired or if there are problems with this approach before I commit my time to a PR though.