bitsandbytes-foundation / bitsandbytes

Accessible large language models via k-bit quantization for PyTorch.
https://huggingface.co/docs/bitsandbytes/main/en/index
MIT License
6.04k stars 606 forks source link

Installation Failing - Is there any recommended environment to install bitsandbytes for specific hardware? #769

Closed tomekrut closed 8 months ago

tomekrut commented 1 year ago

I tried the following

I have A100 80GB

Here is the stack trace make CUDA_VERSION=117

ENVIRONMENT
============================
CUDA_VERSION: 117

NVCC path: /usr/local/cuda-11.7/bin/nvcc GPP path: /usr/bin/g++ VERSION: g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CUDA_HOME: /usr/local/cuda-11.7 CONDA_PREFIX: /mnt/sdb/ml/utils/anaconda/envs/p310p113 PATH: /mnt/sdb/ml/utils/anaconda/envs/p310p113/bin:/snap/bin:/usr/local/cuda-11.7/bin:/mnt/sdb/ml/utils/anaconda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin$ LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64

/usr/local/cuda-11.7/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' --use_fast_math -Xptxas=-v -dc /...//bitsandbytes/csrc/ops.cu /...//bitsandbytes/csrc/kernels.cu -I /usr/local/cuda-11.7/include -I /...//bitsandbytes/csrc -I /mnt/sdb/ml/utils/anaconda/envs/p310p113/include -I /...//bitsandbytes/include -L /usr/local/cuda-11.7/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /mnt/sdb/ml/utils/anaconda/envs/p310p113/lib --output-directory /...//bitsandbytes/build ptxas info : 15 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : 15 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : 15 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0]

ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored ptxas info : 89 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 71 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13__nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 58 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 58 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii' for 'sm_75' ptxas info : Function properties for _Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_75' ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 60 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 67 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 67 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 57 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 57 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi2EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 54 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 53 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'

tomekrut commented 1 year ago

ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 45 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6halfEvPT_PffS1_i' for 'sm_75' ptxas info : Function properties for _Z18kEstimateQuantilesI6halfEvPT_PffS1_i 16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_75' ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 84 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_75' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_75' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii' for 'sm_75' ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 43 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii' for 'sm_75' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 57 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI6__halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_75' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_75' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_75' ptxas info : Function properties for _Z11kDequantizePfPhS_i 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_75' ptxas info : Function properties for _Z9kQuantizePfS_Phi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_75' ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 392 bytes cmem[0] ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12printnonzeroI6halfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z12printnonzeroIfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeNF4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15dhDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeFP4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z18dDequantizeFP4Treehf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15d2DequantizeFP4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeFP4hf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMinPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMaxPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored ptxas info : 89 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 78 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13__nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13__nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii

tomekrut commented 1 year ago
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads

ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii' for 'sm_80' ptxas info : Function properties for _Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_80' ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 60 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 62 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6halfEvPT_PffS1_i' for 'sm_80' ptxas info : Function properties for _Z18kEstimateQuantilesI6halfEvPT_PffS1_i 16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 81 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_80' ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_80' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_80' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii' for 'sm_80' ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 13 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii' for 'sm_80' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI6__halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_80' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_80' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_80' ptxas info : Function properties for _Z11kDequantizePfPhS_i 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_80' ptxas info : Function properties for _Z9kQuantizePfS_Phi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_80' ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 392 bytes cmem[0] ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12printnonzeroI6halfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z12printnonzeroIfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeNF4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15dhDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeFP4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z18dDequantizeFP4Treehf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15d2DequantizeFP4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeFP4hf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMinPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMaxPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored ptxas info : 89 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'

tomekrut commented 1 year ago

ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 78 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13__nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii' for 'sm_86' ptxas info : Function properties for _Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_86' ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 60 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi2EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6halfEvPT_PffS1_i' for 'sm_86' ptxas info : Function properties for _Z18kEstimateQuantilesI6halfEvPT_PffS1_i 16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 81 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_86' ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_86' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_86' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii' for 'sm_86' ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 13 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii' for 'sm_86' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI13__nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_86' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_86' ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_86' ptxas info : Function properties for _Z11kDequantizePfPhS_i 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_86' ptxas info : Function properties for _Z9kQuantizePfS_Phi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_86' ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 392 bytes cmem[0] ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12printnonzeroI6__halfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z12printnonzeroIfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeNF4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15dhDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeFP4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z18dDequantizeFP4Treehf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15d2DequantizeFP4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeFP4hf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMinPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMaxPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads /usr/local/cuda-11.7/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' -dlink /...//bitsandbytes/build/ops.o /...//bitsandbytes/build/kernels.o -o /...//bitsandbytes/build/link.o /usr/bin/g++ -std=c++14 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-11.7/include -I /...//bitsandbytes/csrc -I /mnt/sdb/ml/utils/anaconda/envs/p310p113/include -I /...//bitsandbytes/include /...//bitsandbytes/build/ops.o /...//bitsandbytes/build/kernels.o /...//bitsandbytes/build/link.o /...//bitsandbytes/csrc/common.cpp /...//bitsandbytes/csrc/cpu_ops.cpp /...//bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda117.so -L /usr/local/cuda-11.7/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /mnt/sdb/ml/utils/anaconda/envs/p310p113/lib /usr/local/cuda-11.7/lib64/libcudart.so: file not recognized: File truncated collect2: error: ld returned 1 exit status Makefile:58: recipe for target 'all' failed make: *** [all] Error 1

tomekrut commented 1 year ago

Then I also called python -m bitsandbytes

False

===================================BUG REPORT=================================== //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

warn(msg)

//...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: /mnt/sdb/ml/utils/anaconda/envs/p310p113 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda-11.7/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda-11.7/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2 warn(msg) //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda-11.7/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2 warn(msg) DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')} CUDA SETUP: PyTorch settings found: CUDA_VERSION=117, Highest Compute Capability: 8.0. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md CUDA SETUP: Required library version not found: libbitsandbytes_cuda117.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:

  1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
  2. CUDA driver not installed
  3. CUDA not installed
  4. You have multiple conflicting CUDA libraries
  5. Required library not pre-compiled for this bitsandbytes release! CUDA SETUP: If you compiled from source, try again with make CUDA_VERSION=DETECTED_CUDA_VERSION for example, make CUDA_VERSION=113. CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda.

CUDA SETUP: Something unexpected happened. Please compile from source: git clone https://github.com/TimDettmers/bitsandbytes.git cd bitsandbytes CUDA_VERSION=117 make cuda11x python setup.py install CUDA SETUP: Setup Failed! Traceback (most recent call last): File "/mnt/sdb/ml/utils/anaconda/envs/p310p113/lib/python3.10/runpy.py", line 187, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/mnt/sdb/ml/utils/anaconda/envs/p310p113/lib/python3.10/runpy.py", line 146, in _get_module_details return _get_module_details(pkg_main_name, error) File "/mnt/sdb/ml/utils/anaconda/envs/p310p113/lib/python3.10/runpy.py", line 110, in _get_module_details import(pkg_name) File "//...//bitsandbytes/bitsandbytes/init.py", line 6, in from . import cuda_setup, utils, research File "//...//bitsandbytes/bitsandbytes/research/init.py", line 1, in from . import nn File "//...//bitsandbytes/bitsandbytes/research/nn/init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "//...//bitsandbytes/bitsandbytes/research/nn/modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "//...//bitsandbytes/bitsandbytes/optim/init.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "//...//bitsandbytes/bitsandbytes/cextension.py", line 20, in raise RuntimeError(''' RuntimeError: CUDA Setup failed despite GPU being available. Please run the following command to get more information:

    python -m bitsandbytes

    Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
    to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
    and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
tomekrut commented 1 year ago

@sh0tcall3r - you gave thumbs down... what's wrong with this issue?

swumagic commented 10 months ago

Bitsandbytes was not supported windows before, but my method can support windows.(yuhuang) 1 open folder J:\StableDiffusion\sdwebui,Click the address bar of the folder and enter CMD or WIN+R, CMD 。enter,cd /d J:\StableDiffusion\sdwebui 2 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes

3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows

4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310)

tomekrut commented 10 months ago

@swumagic I don't use Windows.

swumagic commented 10 months ago

OR you are Linux distribution (Ubuntu, MacOS, etc.)system ,AND CUDA Version: 11.X.

Bitsandbytes can support ubuntu.(yuhuang) 1 open folder J:\StableDiffusion\sdwebui,Click the address bar of the folder and enter CMD or WIN+R, CMD 。enter,cd /d J:\StableDiffusion\sdwebui 2 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes

3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows

4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/TimDettmers/bitsandbytes/releases/download/0.41.0/bitsandbytes-0.41.0-py3-none-any.whl

Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310)

github-actions[bot] commented 9 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.