Specialize relevant `cuda::(std::)` types for `__half/bfloat16/fp8`

NVIDIA / cccl

CUDA Core Compute Libraries

Other

1.13k stars 132 forks source link

Specialize relevant `cuda::(std::)` types for `__half/bfloat16/fp8` #525

Open jrhemstad opened 11 months ago

jrhemstad commented 11 months ago

The CUDA extended floating point types __half and __nv_bfloat16 and fp8 (and others) are important types for many CUDA C++ developers.

As a CUDA C++ developer, I'd like it if relevant CCCL utilities like <type_traits>, atomic<T>, complex<T> all worked with these types.

### Tasks
- [ ] https://github.com/NVIDIA/cccl/issues/1139
- [ ] Specializations for <limits>
- [ ] Specializations for `<type_traits>`
- [ ] Specializations of complex<T> for fp8?
- [ ] Specializations for `atomic<T>`
- [ ] Overloads for <cmath> functions for cudart extended floating-point types

srinivasyadav18 commented 10 months ago

Hi @jrhemstad, This looks like an interesting issue to me. I would like to contribute to this Issue.

jrhemstad commented 10 months ago

Hey @srinivasyadav18, thanks for your interest in helping make CCCL better!

@griwes was just starting to look into this issue. He'll have a better idea of the details of what will be required and what parts you could help out with. For example, specializing complex vs <type_traits> will be different tasks.

srinivasyadav18 commented 10 months ago

@jrhemstad Thanks! I will coordinate with @griwes to see what I can help. I have done some initial work enabling type_traits for __half and __nv_bfloat16, I will share the link to the branch in some time soon.

srinivasyadav18 commented 10 months ago

Hi @griwes, I made these initial changes which enables __half and __nv_bfloat16. I am exactly not sure if it is the right way to include and in __type_traits/is_floating_point.h. Please let me know if I am missing anything here. Thanks! :)

jrhemstad commented 10 months ago

I am exactly not sure if it is the right way to include and in __type_traits/is_floating_point.h

I'm guessing we're going to have to be more careful about how we include those headers because we support versions of the CTK that may not have those headers yet. So it'll require some careful ifdefs. Here's an example from CUB: https://github.com/NVIDIA/cub/blob/0fc3c3701632a4be906765b73be20a9ad0da603d/cub/util_type.cuh#L43C1-L48

@miscco @gevtushenko may be able to help figure out the right way to guard including those headers.

jrhemstad commented 9 months ago

@gevtushenko says that CUB already has some of the relevant values for <limits> that can be used.

ngc92 commented 4 months ago

std::numeric_limits<half> would be quite handy in several places (e.g., preventing overflows, unit tests over templated kernels that adjust their precision requirements based on epsilon). Even rolling your own is not really possible, because of the lack of constexpr constructors for half and bfloat16.

Is there a reason this needs to happen in cuda::std? AFAIK, you are allowed to specialize this directly in std::. From cppreference:

Implementations may provide specializations of std::numeric_limits for implementation-specific types: e.g. GCC provides std::numeric_limits<__int128>. Non-standard libraries may add specializations for library-provided types, e.g. OpenEXR provides std::numeric_limits for a 16-bit floating-point typ

shangz-ai commented 2 weeks ago

Hello team, Can we add overloads of cuda::std::frexp for the extended floating point types? This is needed in the case of pytorch frexp_cuda. See https://github.com/pytorch/pytorch/pull/133313