Closed YichengDWu closed 2 months ago
We have found similar issues internally with "domain alignment" of copies. This is being worked on right now
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Still needs to be addressed.
will be fixed in 3.5.1. ETA is hard to predict but hopefully next couple of weeks
@YichengDWu please verify and close.
I can't verify it now since I no longer have an NVIDIA card. Thank you for fixing this!
Describe the bug
Automatic vectorization of copying doesn't account for shape divisibility. When attempting to copy a tensor with the layout (_2,_3):(_1, _2), the greatest common vector length is 6. However, it's vectorized at 128 bits, which means copying four elements at once. This approach doesn't work for a tensor with a size of 6.
Steps/Code to reproduce bug
I got the following error:
Expected behavior
copy(src, dst)
should just work. Internally, it should be able to compute a correct predicate and use that to do the copy.Environment details (please complete the following information):