Open cloudhan opened 2 months ago
I agree this is a bug, those .with
functions should only be on the ZFILL
variants, it appears.
@thakkarV Do NV has any plan to address this issue? This one is quite subtle and can be easily made wrong. And it will be a breaking change for the users that relying on the exotic behavior. The late it is addressed, the wide it will impact the user.
@yzhaiustc can we please add this to the docket for 3.6 fixes?
@cloudhan will fix in 3.6 which will land in a month or so. is that ok?
@yzhaiustc can we please add this to the docket for 3.6 fixes?
sure. thanks :-)
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Describe the bug
nvcc cp_async.cu -Iinclude -std=c++20 --expt-relaxed-constexpr -gencode=arch=compute_80,code=sm_80 && ./a.out
produces
Expected behavior
When
_ZFILL
is not requested, those predicated out values should not be touched. This might cause__shared__
buffer declare later and is large enough.This is because
https://github.com/NVIDIA/cutlass/blob/f93a69134ec8259fd235f220209d6f8734a5cb06/include/cute/atom/copy_traits_sm80.hpp#L77-L82
re-dispatch to
_ZFILL
trait silently and this will generally cause very very very subtle bug when the user is expecting an async version ofCopy_Atom<UniversalCopy<uint128_t>, T>
as a simple substitute, but not theignore-src
behavior!Since the
_ZFILL
variants exists, this implicit behavior should be removed.The only workaround is to replace
copy_if
as followsEnvironment details (please complete the following information): f93a69134ec8259fd235f220209d6f8734a5cb06