Open leofang opened 2 months ago
Build https://github.com/cupy/cupy/pull/8412 from source
To unblock myself I've force-pushed back to the snapshot from yesterday, but starting from this commit https://github.com/cupy/cupy/commit/08e6a3c63fe734c259a83f76381ed973d99cd7bd it should be reproducible. By tracing the code change following the compiler error, I believe the culprit is https://github.com/NVIDIA/cccl/pull/1872, in particular https://github.com/NVIDIA/cccl/pull/1872#pullrequestreview-2122194270 (cc: @bernhardmgruber for vis).
@leofang thank you for reporting this and I am sorry for the breakage!
The specializations of Thrust's API that cupy performs looks quite extraordinary to me (I actually had to educate myself on the syntax) and I would like to suggest a better way.
In your current implementation, you specialize a member of thrust::less
directly:
bool thrust::less<float>::operator() (
const float& lhs, const float& rhs) const {
return _real_less<float>(lhs, rhs);
}
which relies on thrust::less
having an actual operator()
. We changed this in #1872, since we now pull in the operator from ::cuda::std::less
. If you specialize the entire functor instead, your code should work before and after the changes of #1872:
template <>
struct thrust::less<float> {
bool operator() (float lhs, float rhs) const {
return _real_less<float>(lhs, rhs);
}
};
This is arguably safer and will also be forward compatible when we eventually alias thust::less
to cuda::std::less
entirely (we cannot do this yet, because we need to give some notice with the deprecation warnings).
As a general remark: I think redefining the meaning of thrust::less<float>
is quite dangerous, since some algorithms depend on the exact meaning of this function object to provide more efficient implementations. E.g., thrust::sort
detects whether the comparator is thrust::less<T>
to branch into radix sort, and then takes the mathematical properties of "less than" as given, independently of how thrust::less<T>
is defined.
Hi Bernhard, thanks for your reply.
which relies on
thrust::less
having an actualoperator()
. We changed this in https://github.com/NVIDIA/cccl/pull/1872 ... (we cannot do this yet, because we need to give some notice with the deprecation warnings).
To me this change of behavior requires a deprecation cycle, though I do have a hard time to see where compile-time warnings can be injected for this particular case.
As a general remark: I think redefining the meaning of
thrust::less<float>
is quite dangerous, ... E.g.,thrust::sort
detects whether the comparator isthrust::less<T>
to branch into radix sort, and then takes the mathematical properties of "less than" as given, independently of howthrust::less<T>
is defined.
Is this expectation documented anywhere? Would it be a concern if the comparator fulfills this "less than" property?
To give some context, CuPy had to define a custom comparator in order to follow NumPy behavior, where NaNs are sorted to the end of an array (see code comments here).
An extra note: we cannot keep breaking critical users like CuPy just casually (https://github.com/NVIDIA/cccl/issues/1494). I've been following @jrhemstad's advertisement and encouraging the CuPy team to stay at CCCL head, but my push would be questionable if they keep hitting maintenance issues.
To me this change of behavior requires a deprecation cycle
It does not. The CuPy code is incorrect in how it provides a custom comparison operator. The proper way to provide a custom comparison operator is to pass one as an argument, not reach into internal bits of the library and modify them at will.
This is covered under our compatibility guidelines: https://github.com/NVIDIA/cccl?tab=readme-ov-file#compatibility-guidelines
Do not add any declarations to the thrust::, cub::, nv::, or cuda:: namespaces unless an exception is noted for a specific symbol, e.g., specializing a type trait.
I'll grant that this could be improved to say "Do not add any declarations or template specializations...", but the point remains the same.
All that said, we don't like breaking people (even when it wasn't our fault), so we will be helping to contribute a fix to CuPy.
I look forward to when you have the time to help us set up https://github.com/NVIDIA/cccl/issues/1494!
PR for CuPi: https://github.com/cupy/cupy/pull/8446
I had a somewhat lengthy discussion with Jake offline. Below is a summary of what I asked regarding specializing thrust::less
vs thrust::less::operator
noted above (and the offline thread), for posterity:
At one point we were even encouraged to specialize a CUB internal template (in the context of 64-bit support), so you must forgive CuPy developers to have the impression that everything template is customizable at will.
Perhaps let me ask this way: Why is specializing thrust::less
ok (as suggested above and done in https://github.com/cupy/cupy/pull/8446) but specializing thrust::less::operator
is not? As per the (being updated) rule number 1, CCCL discourages users from specializing all things under thrust::
anyway. But as a user, all we see in the public API of thrust::less
is that there’s one and only one member function we need to define, so it’s unclear where the line is between the two approaches from user perspective.
everything template is customizable at will
The standard practice in C++ is that you should not specialize a template that does not belong to you unless you've been explicitly granted permission to do so. Everyone is at a different place with their experience and expertise in C++, so it's not unreasonable that someone may not be familiar with this yet.
Being given permission to specialize one template in one library does not grant permission to specialize another template in a different library 😅 .
Why is specializing thrust::less ok (as suggested above and done in https://github.com/cupy/cupy/pull/8446) but specializing thrust::less::operator is not?
I explained in https://github.com/cupy/cupy/pull/8446#discussion_r1698853932 that I don't think either specialization is correct. I think @bernhardmgruber was trying to be courteous and do the minimal possible set of changes to unblock cupy, but ultimately, the specialization is still ill-formed.
The good news is that it's very easy to fix by just using a custom comparator type instead of using thrust::less
directly.
Is this a duplicate?
Type of Bug
Compile-time Error
Component
Thrust
Describe the bug
CCCL diff (between yesterday and today): https://github.com/cupy/cccl/compare/2246a8772a78346d91e580e121cedb820da2648f...d4f928e45a6ab452ff6a9de02769d3c9532ee423
Yesterday it built fine. Today we see this (from https://ci.preferred.jp/cupy.linux.cuda-build/166014/):
How to Reproduce
Build https://github.com/cupy/cupy/pull/8412 from source
Expected behavior
No compile-time errors.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response