HazyResearch / ThunderKittens

Tile primitives for speedy kernels
MIT License
1.66k stars 70 forks source link

c++20 does not work? #45

Open ziyuhuang123 opened 4 months ago

ziyuhuang123 commented 4 months ago

Hi! I am running on 4090 for example/attn/4090, using nvcc=12.3, gcc and g++=10, but meet error below:

(py_hzy_new) 4090-01% make                     
nvcc -ccbin=/home/zyhuang/miniconda3/envs/py_hzy_new/bin/g++ -DNDEBUG -Xcompiler=-fPIE --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler=-Wno-psabi -Xcompiler=-fno-strict-aliasing --use_fast_math -forward-unknown-to-host-compiler -O3 -Xnvlink=--verbose -Xptxas=--verbose -Xptxas=--warn-on-spills -std=c++20 -MD -MT -MF -x cu -lrt -lpthread -ldl -DKITTENS_4090 -arch=sm_89 -lcuda -lcudadevrt -lcudart_static -lcublas 4090_ker.cu -o attn_fwd
../../../src/common/base_types.cuh(93): error: namespace "std" has no member "bit_cast"
      static __attribute__((device)) inline constexpr bf16 zero() { return std::bit_cast<__nv_bfloat16>(uint16_t(0x0000)); }

I guess.... c++20 is still not supported somehow?

ahepp commented 4 months ago

I was able to replicate the issue, and resolved it by using gcc/g++ 11

koceja commented 2 weeks ago

GCC 10 doesn't support bit_cast (p0476r2), but GCC11 does.

You can check the implementation status of different std features for these compilers here: GCC 10 implementation status GCC 11 implementation status

Gcc 10 doesn't work with this repo, should use the newer compilers/versions of gcc as they continuously add more support for the different features of c++20 and beyond.