Closed tomekrut closed 8 months ago
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 45 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6halfEvPT_PffS1_i' for 'sm_75' ptxas info : Function properties for _Z18kEstimateQuantilesI6halfEvPT_PffS1_i 16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_75' ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 84 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_75' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_75' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii' for 'sm_75' ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 43 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_75' ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii' for 'sm_75' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 57 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI6__halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii' for 'sm_75' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_75' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 168 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_75' ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_75' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_75' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_75' ptxas info : Function properties for _Z11kDequantizePfPhS_i 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_75' ptxas info : Function properties for _Z9kQuantizePfS_Phi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_75' ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 392 bytes cmem[0] ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12printnonzeroI6halfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z12printnonzeroIfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeNF4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15dhDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeFP4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z18dDequantizeFP4Treehf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15d2DequantizeFP4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeFP4hf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMinPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMaxPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored ptxas info : 89 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 78 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_80' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13__nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_80' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13__nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii' for 'sm_80' ptxas info : Function properties for _Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_80' ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 60 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_80' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 62 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_80' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_80' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_80' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6halfEvPT_PffS1_i' for 'sm_80' ptxas info : Function properties for _Z18kEstimateQuantilesI6halfEvPT_PffS1_i 16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 81 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_80' ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_80' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_80' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii' for 'sm_80' ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_80' ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 13 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii' for 'sm_80' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI6__halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii' for 'sm_80' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_80' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_80' ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_80' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_80' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_80' ptxas info : Function properties for _Z11kDequantizePfPhS_i 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_80' ptxas info : Function properties for _Z9kQuantizePfS_Phi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_80' ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 392 bytes cmem[0] ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12printnonzeroI6halfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z12printnonzeroIfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeNF4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15dhDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeFP4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z18dDequantizeFP4Treehf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15d2DequantizeFP4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeFP4hf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMinPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMaxPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored ptxas info : 89 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 78 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 8 bytes stack frame, 4 bytes spill stores, 4 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_86' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13__nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_86' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 27 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 31 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 34 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii' for 'sm_86' ptxas info : Function properties for _Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_86' ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 60 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_86' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 66 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_86' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi2EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_86' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 46 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_86' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 44 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6halfEvPT_PffS1_i' for 'sm_86' ptxas info : Function properties for _Z18kEstimateQuantilesI6halfEvPT_PffS1_i 16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 81 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_86' ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 82 registers, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_86' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 38 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii' for 'sm_86' ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6halfPfS2_PcS3_PiS4_S1_S4_fiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii' for 'sm_86' ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6halfS1_S1_S3_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 388 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6halfPT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86' ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii 192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 440 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_86' ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 13 registers, 396 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii' for 'sm_86' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveIfLi128ELi32EEviiiPT_PhPfPKfS1_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 40 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI13nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI13__nv_bfloat16Li128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z26kgemm_4bit_inference_naiveI6halfLi128ELi16EEviiiPT_PhPfPKfS2_iiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi256EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi160EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi128EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii' for 'sm_86' ptxas info : Function properties for _Z20kgemm_4bit_inferenceI6halfLi96EEviiiPT_PhPfS2_iiii 32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi16ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi96EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi64EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi32EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi128EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi160EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi192EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii' for 'sm_86' ptxas info : Function properties for _Z11gemm_deviceI6halfLi32ELi256EEviiiPT_S2_S2_iii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 167 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi2EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIfLi2EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi1EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIfLi1EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 30 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIhLi0EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIhLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z5kfuncIfLi0EEvPT_S1_S0_l' for 'sm_86' ptxas info : Function properties for _Z5kfuncIfLi0EEvPT_S1_S0_l 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 384 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_86' ptxas info : Function properties for _Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z15kgetColRowStatsI6halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_86' ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 404 bytes cmem[0] ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_86' ptxas info : Function properties for _Z11kDequantizePfPhS_i 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_86' ptxas info : Function properties for _Z9kQuantizePfS_Phi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0] ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_86' ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 14 registers, 392 bytes cmem[0] ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12printnonzeroI6__halfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z12printnonzeroIfEvPT_iPKc 104 bytes stack frame, 76 bytes spill stores, 76 bytes spill loads ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeNF4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15dhDequantizeNF4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z12dQuantizeFP4f 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z18dDequantizeFP4Treehf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z15d2DequantizeFP4h 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z14dDequantizeFP4hf 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMinPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Function properties for _Z9atomicMaxPff 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads /usr/local/cuda-11.7/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' -dlink /...//bitsandbytes/build/ops.o /...//bitsandbytes/build/kernels.o -o /...//bitsandbytes/build/link.o /usr/bin/g++ -std=c++14 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-11.7/include -I /...//bitsandbytes/csrc -I /mnt/sdb/ml/utils/anaconda/envs/p310p113/include -I /...//bitsandbytes/include /...//bitsandbytes/build/ops.o /...//bitsandbytes/build/kernels.o /...//bitsandbytes/build/link.o /...//bitsandbytes/csrc/common.cpp /...//bitsandbytes/csrc/cpu_ops.cpp /...//bitsandbytes/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda117.so -L /usr/local/cuda-11.7/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /mnt/sdb/ml/utils/anaconda/envs/p310p113/lib /usr/local/cuda-11.7/lib64/libcudart.so: file not recognized: File truncated collect2: error: ld returned 1 exit status Makefile:58: recipe for target 'all' failed make: *** [all] Error 1
Then I also called python -m bitsandbytes
False
===================================BUG REPORT=================================== //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
//...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: /mnt/sdb/ml/utils/anaconda/envs/p310p113 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda-11.7/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda-11.7/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2 warn(msg) //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: /usr/local/cuda-11.7/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... //...//bitsandbytes/bitsandbytes/cuda_setup/main.py:166: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might missmatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.2 warn(msg) DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')} CUDA SETUP: PyTorch settings found: CUDA_VERSION=117, Highest Compute Capability: 8.0. CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md CUDA SETUP: Required library version not found: libbitsandbytes_cuda117.so. Maybe you need to compile it from source? CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
================================================ERROR===================================== CUDA SETUP: CUDA detection failed! Possible reasons:
make CUDA_VERSION=DETECTED_CUDA_VERSION
for example, make CUDA_VERSION=113
.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via conda list | grep cuda
.CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=117 make cuda11x
python setup.py install
CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/mnt/sdb/ml/utils/anaconda/envs/p310p113/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/mnt/sdb/ml/utils/anaconda/envs/p310p113/lib/python3.10/runpy.py", line 146, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/mnt/sdb/ml/utils/anaconda/envs/p310p113/lib/python3.10/runpy.py", line 110, in _get_module_details
import(pkg_name)
File "//...//bitsandbytes/bitsandbytes/init.py", line 6, in
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
@sh0tcall3r - you gave thumbs down... what's wrong with this issue?
Bitsandbytes was not supported windows before, but my method can support windows.(yuhuang) 1 open folder J:\StableDiffusion\sdwebui,Click the address bar of the folder and enter CMD or WIN+R, CMD 。enter,cd /d J:\StableDiffusion\sdwebui 2 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes
3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows
4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl
Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310)
@swumagic I don't use Windows.
OR you are Linux distribution (Ubuntu, MacOS, etc.)system ,AND CUDA Version: 11.X.
Bitsandbytes can support ubuntu.(yuhuang) 1 open folder J:\StableDiffusion\sdwebui,Click the address bar of the folder and enter CMD or WIN+R, CMD 。enter,cd /d J:\StableDiffusion\sdwebui 2 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes
3 J:\StableDiffusion\sdwebui\py310\python.exe -m pip uninstall bitsandbytes-windows
4 J:\StableDiffusion\sdwebui\py310\python.exe -m pip install https://github.com/TimDettmers/bitsandbytes/releases/download/0.41.0/bitsandbytes-0.41.0-py3-none-any.whl
Replace your SD venv directory file(python.exe Folder) here(J:\StableDiffusion\sdwebui\py310)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
I tried the following
I have A100 80GB
Here is the stack trace
make CUDA_VERSION=117
ENVIRONMENT
============================
CUDA_VERSION: 117
NVCC path: /usr/local/cuda-11.7/bin/nvcc GPP path: /usr/bin/g++ VERSION: g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CUDA_HOME: /usr/local/cuda-11.7 CONDA_PREFIX: /mnt/sdb/ml/utils/anaconda/envs/p310p113 PATH: /mnt/sdb/ml/utils/anaconda/envs/p310p113/bin:/snap/bin:/usr/local/cuda-11.7/bin:/mnt/sdb/ml/utils/anaconda/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin$ LD_LIBRARY_PATH: /usr/local/cuda-11.7/lib64
/usr/local/cuda-11.7/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' --use_fast_math -Xptxas=-v -dc /...//bitsandbytes/csrc/ops.cu /...//bitsandbytes/csrc/kernels.cu -I /usr/local/cuda-11.7/include -I /...//bitsandbytes/csrc -I /mnt/sdb/ml/utils/anaconda/envs/p310p113/include -I /...//bitsandbytes/include -L /usr/local/cuda-11.7/lib64 -lcudart -lcublas -lcublasLt -lcusparse -L /mnt/sdb/ml/utils/anaconda/envs/p310p113/lib --output-directory /...//bitsandbytes/build ptxas info : 15 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : 15 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : 15 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored ptxas info : 89 bytes gmem ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75' ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 4 registers, 352 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 77 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI13nv_bfloat16Li5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi5ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi5ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 72 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 71 registers, 432 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI13__nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI13nv_bfloat16Li0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_75' ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 80 registers, 456 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI13nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI13__nv_bfloat16Li512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi2EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi0EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi64ELi8ELi1EEvPfPhS0_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi2EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi0EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii' for 'sm_75' ptxas info : Function properties for _Z20kDequantizeBlockwiseI6halfLi512ELi64ELi8ELi1EEvPfPhS1_PT_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 392 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 58 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI13nv_bfloat16Li4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi2EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 29 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi1EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 36 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0ELi0EEvPfPT_S0_PhS0_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi2EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 24 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 26 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 28 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi1EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 32 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi64ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 39 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi128ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi256ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi512ELi2ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 42 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi1024ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi2048ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi1ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 58 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75' ptxas info : Function properties for _Z18kQuantizeBlockwiseI6halfLi4096ELi4ELi0ELi0EEvPfPT_S1_PhS1_ii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 400 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii' for 'sm_75' ptxas info : Function properties for _Z19kPercentileClippingI6halfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_75' ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 37 registers, 376 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 484 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 116 registers, 464 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 60 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKffffffifPfS5_S5_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 63 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi' for 'sm_75' ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPKffffffifPfS6_S6_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 452 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi5EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi5EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 67 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi2EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 67 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPffffiS3_S3_S3_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi' for 'sm_75' ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6halfLi1EEvPT_S2_PhPffffiS4_S4_S4_ffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit2StateI13nv_bfloat16Li0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit2StateI6halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 64 registers, 436 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI13nv_bfloat16Li0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 57 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 57 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 55 registers, 424 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi4EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI13nv_bfloat16Li5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi5EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi5EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 56 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6halfLi2EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 54 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6halfLi1EEvPT_S2_PfS3_ffffffiffbi' for 'sm_75' ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_ffffffiffbi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 50 registers, 428 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi4ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI13nv_bfloat16Li5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 47 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi5ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi5ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 53 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 48 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6halfLi2ELi4096ELi8EEvPT_S2_PfS3_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi' for 'sm_75' ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_ffffiffi 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 49 registers, 416 bytes cmem[0] ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_ffffiffi' for 'sm_75'