NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines
Other
5.73k stars 982 forks source link

[QST] Use visual studio 2022 to build cutlass(on windows 11) #934

Closed XiejiLi closed 1 year ago

XiejiLi commented 1 year ago

Hi,

Brief description

Recently I build cutlass on my windows pc, and I follow the insturction of quick start guide, but I still can't build cutlass_profiler.

Machine information

Edition Windows 11 Home
Version 22H2
Installed on    ‎12/‎5/‎2022
OS build    22621.1555
Experience  Windows Feature Experience Pack 1000.22640.1000.0

(base) PS D:\Li\github\cutlass\build> cmake --version
cmake version 3.25.2

CMake suite maintained and supported by Kitware (kitware.com/cmake).

(base) PS D:\Li\github\cutlass\build> gcc --version
gcc.exe (MinGW.org GCC Build-2) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

cuda toolkit version: v12.1

I did following step

  1. Use cmake to build the cutlass
    (base) PS D:\Li\github\cutlass\build> cmake ..
    -- CMake Version: 3.25.2
    -- Selecting Windows SDK version 10.0.22000.0 to target Windows 10.0.22621.
    -- CUDART: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/lib/x64/cudart.lib
    -- CUDA Driver: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/lib/x64/cuda.lib
    -- NVRTC: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.1/lib/x64/nvrtc.lib
    -- Default Install Location: install
    -- CUDA Compilation Architectures: 70;72;75;80;86;87;89;90;90a
    -- Enable caching of reference results in conv unit tests
    -- Enable rigorous conv problem sizes in conv unit tests
    -- Using NVCC flags: --expt-relaxed-constexpr;-Xcompiler=/W3;-Xcompiler=/WX;-Xcompiler=/wd4819;-Xcompiler=/fp:strict;-DCUTLASS_TEST_LEVEL=0;-DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1;-DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1;-DCUTLASS_DEBUG_TRACE_LEVEL=0
    -- CUTLASS Revision: 6f8596ce
    -- Configuring cublas ...
    -- cuBLAS Disabled.
    -- Configuring cuBLAS ... done.
    -- Completed generation of library instances. See D:/Li/github/cutlass/build/tools/library/library_instance_generation.log for more information.
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.f4c0f90fd008.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.1dfacc86ea92.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.20233764449b.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.7744af1df594.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.baf3046ac2a8.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.73091fe2ddf8.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.a619e23e4b10.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.21cb20d01b86.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.f7b2084167aa.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.8a2c805a8365.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.3a3b51af06c5.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.bec554f7261b.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.5dcc557f5b08.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.62360d1d906c.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.a797e19313b2.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.001928f52966.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.90fd2e2d34a6.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.0ded2e275a22.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.625f0e9c9bb5.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.38c9814dbe86.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.278a8d6ebdcd.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.67a74d918e36.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.d5c513a03c20.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.68398d26e4b6.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.139a0572c8fa.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.40060e63456a.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.351b0485bddd.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.809ee3776d4f.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.b8835fb6cf85.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.641e77f46b0d.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.85b631e44344.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.06d968c00aa6.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.031278d5589f.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.ff5985ea4d5d.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.fc77fa055cb6.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.affb4ceb8617.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.ca2a4140cc43.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.9dc2532a10dc.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.704fbc66aaab.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.8ef24420d1e1.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.1574338352ce.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.6f42554b1298.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.d58438556791.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.4a4a20ed5542.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.5e4d29cbc179.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.42d246dc7133.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.af3d068fba3e.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.63f3013ddbec.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.8e7962b0e629.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.a529e73f8a57.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.48087e1f71bb.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.56307760d945.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.7b923148bc4a.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.c5d36023da30.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.54031a307ff5.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.6acbdf93d0e0.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.96e0a7703a8e.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.bd1d9ad79f87.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.7b131cbaf35d.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.f4a9f3a7d260.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.6a9839c73377.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.4aa09787f222.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.68fc88ae280e.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.0b47713ce6da.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.2ecd25aee8a7.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.73797d23e867.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.1ed277bb6b6c.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.b8b7f17d7468.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.7d6c230e2bd4.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.8d9f09abec5e.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.ef14c00cfde7.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.bac167fddf9d.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.3d5d1aaa6f22.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.065be80d311c.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.8e3bb10a78b1.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.cf7931d77103.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.ee97cbce43a4.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.1d39af7e0411.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.5d9802d4805b.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.6f129ecf1e06.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.bdb70988fac1.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.4aa29306343c.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.11e8807b4750.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.bfa6388ec414.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.738beeb04277.cu
    -- Generating D:/Li/github/cutlass/build/tools/library/cutlass_library_objs.unity.80f81f17fa3b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.6d3bcd3b9720.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.d69f43d7ea78.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.f5981aa013fb.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.3531a25824a3.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.2f14c0bb03c5.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.d79797107feb.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.559f6106384e.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.f82ac67ac892.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt.unity.42678b6d7910.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt_3x.unity.582046f8ba4c.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_simt_3x.unity.4ab17cff250b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm70.unity.6b143a2e1df9.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm70.unity.1fc183d42e99.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.5efd287dfc75.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.a47cfaba889e.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.cc02800bb7eb.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.a7f311c282bd.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.c3054c7c8039.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.6496c940e968.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm75.unity.39799870a161.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f16_sm80.unity.cecb8069b6d3.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f32_sm80.unity.bdd9dca39b4c.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f32_sm80.unity.23ee7662f9e5.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f32_sm80.unity.a1f9ce5afd35.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm90.unity.a3910ffeca5a.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_sm90.unity.849cadead9c5.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_alignx_sm90.unity.c0249d5fb6c5.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_epilogue_fusion_sm90.unity.dfe5b062dbf2.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_epilogue_fusion_sm90.unity.7c0f4f797130.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_cluster_multicast_sm90.unity.e443110c544b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_gmma_rs_warpspecialized_sm90.unity.ffbeb6dbd46a.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f32_tf32_sm80.unity.abda0e3ffa35.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f32_tf32_sm80.unity.45837f3ac377.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f32_tf32_sm80.unity.96393805cc32.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f64.unity.3ace2e65400d.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f64.unity.a57d2f5b534b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f64.unity.0c55903fcf2b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_f64.unity.7d4e48f01c89.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_s32_sm80.unity.c96ffaab923b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_s32_sm80.unity.a606d8abf097.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_s32_sm80.unity.ab4eb6160524.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_s32_sm80.unity.7dbc418d7ad7.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.a201f3aca1df.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.77a5c702efff.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.890c66a7770e.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.588b310779b0.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.d364d9c6fb85.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.717dca98d014.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.1e2c1949c454.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.dea0c9ac1704.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_wmma.unity.6c3edfd0f631.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_tensorop_planar_complex.unity.b339db93842f.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_grouped.unity.b6a1ffb56f9c.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_grouped_scheduler.unity.12df3577deeb.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_grouped_rank_2k_scheduler.unity.7e1cdddea146.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_sparse_tensorop_sm80.unity.afe0f3a4fdb9.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_sparse_tensorop_sm80.unity.77d5221e272a.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_sparse_tensorop_sm80.unity.ec81f1e3f490.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_sparse_tensorop_sm80.unity.5ef7de4d448a.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemv_device.unity.6c87d70f4480.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.467514c5998d.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.00e6e83b5d07.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.b619dac1e79b.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.4a4ff5cd2d66.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.91dfb85ec7bc.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.f7e5b3ef2f0c.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.c5be50067833.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.6a3056c1b477.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.a32a7767196f.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.07ea47b032f1.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.0efe28d4e006.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.2e7407db85a0.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.39e9ae3d80ed.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.4f6bc1cada87.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.ceb550dbcdd0.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.42ec4295e41c.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.02c4b3fe66bb.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.8d882128fb76.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.e5709fcdfa7e.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.80c51f056c87.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.1b5862952e07.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_blas3.unity.6d5e5c7e1dd5.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_grouped_blas3.unity.bf3bdd25f267.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_grouped_blas3.unity.9e37805c0015.cu
    -- Generating D:/Li/github/cutlass/build/test/unit/gemm/device/cutlass_test_unit_gemm_device_grouped_blas3.unity.de069760a863.cu
    -- Enable device reference verification in conv unit tests
    -- Configuring done
    -- Generating done
    -- Build files have been written to: D:/Li/github/cutlass/build

    It's seens build successfully, then I use visual studio 2022 to open cutlass\build\ALL_BUILD.vcxproj

  2. Although I can compile and build examples\00_basic_gemm successfully, and run with a passed result, but I still can't build cutlass_profiler, the error as follow:
    Rebuild started...
    1>------ Rebuild All started: Project: cutlass_profiler, Configuration: Debug x64 ------
    1>Building Custom Rule D:/Li/github/cutlass/tools/profiler/CMakeLists.txt
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\symm_operation_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\trmm_operation_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\conv2d_operation_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\rank_k_operation_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\operation_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\gemm_operation_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\cutlass_profiler.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\device_context.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\options.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\cublas_helpers.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\device_allocation.cu...
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\rank_2k_operation_profiler.cu...
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\device_allocation.obj "D:\Li\github\cutlass\tools\profiler\src\device_allocation.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\cublas_helpers.obj "D:\Li\github\cutlass\tools\profiler\src\cublas_helpers.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\device_context.obj "D:\Li\github\cutlass\tools\profiler\src\device_context.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\options.obj "D:\Li\github\cutlass\tools\profiler\src\options.cu"
    1>
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\trmm_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\trmm_operation_profiler.cu"
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\cutlass_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\cutlass_profiler.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\symm_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\symm_operation_profiler.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\operation_profiler.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\rank_k_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\rank_k_operation_profiler.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\gemm_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\gemm_operation_profiler.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\conv2d_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\conv2d_operation_profiler.cu"
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\rank_2k_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\rank_2k_operation_profiler.cu"
    1>cublas_helpers.cu
    1>tmpxft_000069dc_00000000-7_cublas_helpers.compute_90a.cudafe1.cpp
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\conv3d_operation_profiler.cu...
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\conv3d_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\conv3d_operation_profiler.cu"
    1>device_context.cu
    1>options.cu
    1>tmpxft_0000eaa4_00000000-7_device_context.compute_90a.cudafe1.cpp
    1>Compiling CUDA source file ..\..\..\tools\profiler\src\sparse_gemm_operation_profiler.cu...
    1>
    1>D:\Li\github\cutlass\build\tools\profiler>"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.35.32215\bin\HostX64\x64" -x cu   -ID:\Li\github\cutlass\include -ID:\Li\github\cutlass\tools\profiler\src -ID:\Li\github\cutlass\build\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -I\include -I\examples -ID:\Li\github\cutlass\tools\library\include -ID:\Li\github\cutlass\tools\util\include -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include"     --keep-dir x64\Debug  -maxrregcount=0   --machine 64 --compile -cudart static --generate-code=arch=compute_70,code=[sm_70] --generate-code=arch=compute_70,code=[compute_70] --generate-code=arch=compute_72,code=[sm_72] --generate-code=arch=compute_72,code=[compute_72] --generate-code=arch=compute_75,code=[sm_75] --generate-code=arch=compute_75,code=[compute_75] --generate-code=arch=compute_80,code=[sm_80] --generate-code=arch=compute_80,code=[compute_80] --generate-code=arch=compute_86,code=[sm_86] --generate-code=arch=compute_86,code=[compute_86] --generate-code=arch=compute_87,code=[sm_87] --generate-code=arch=compute_87,code=[compute_87] --generate-code=arch=compute_89,code=[sm_89] --generate-code=arch=compute_89,code=[compute_89] --generate-code=arch=compute_90,code=[sm_90] --generate-code=arch=compute_90,code=[compute_90] --generate-code=arch=compute_90a,code=[sm_90a] --generate-code=arch=compute_90a,code=[compute_90a] --expt-relaxed-constexpr -std=c++17 -Xcompiler="/EHsc -Zi -Ob0 /WX /wd4819 /fp:strict" -g  -D_WINDOWS -DCUTLASS_ENABLE_TENSOR_CORE_MMA=1 -DCUTLASS_TEST_LEVEL=0 -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=1 -DCUTLASS_CONV_UNIT_TEST_RIGOROUS_SIZE_ENABLED=1 -DCUTLASS_DEBUG_TRACE_LEVEL=0 -D"CMAKE_INTDIR=\"Debug\"" -D_MBCS -DWIN32 -D_WINDOWS -D"CMAKE_INTDIR=\"Debug\"" -Xcompiler "/EHsc /W3 /nologo /Od /FS /Zi /RTC1 /MDd /GR" -Xcompiler "/Fdcutlass_profiler.dir\Debug\vc143.pdb" -o cutlass_profiler.dir\Debug\sparse_gemm_operation_profiler.obj "D:\Li\github\cutlass\tools\profiler\src\sparse_gemm_operation_profiler.cu"
    1>cutlass_profiler.cu
    1>tmpxft_00002ef4_00000000-7_options.compute_90a.cudafe1.cpp
    1>symm_operation_profiler.cu
    1>trmm_operation_profiler.cu
    1>rank_k_operation_profiler.cu
    1>rank_2k_operation_profiler.cu
    1>tmpxft_0000b518_00000000-7_cutlass_profiler.compute_90a.cudafe1.cpp
    1>gemm_operation_profiler.cu
    1>tmpxft_000022dc_00000000-7_symm_operation_profiler.compute_90a.cudafe1.cpp
    1>tmpxft_0000f3a0_00000000-7_trmm_operation_profiler.compute_90a.cudafe1.cpp
    1>conv2d_operation_profiler.cu
    1>tmpxft_0000c8ec_00000000-7_rank_k_operation_profiler.compute_90a.cudafe1.cpp
    1>tmpxft_0000a1e8_00000000-7_rank_2k_operation_profiler.compute_90a.cudafe1.cpp
    1>tmpxft_00002e84_00000000-7_gemm_operation_profiler.compute_90a.cudafe1.cpp
    1>tmpxft_00002c88_00000000-7_conv2d_operation_profiler.compute_90a.cudafe1.cpp
    1>operation_profiler.cu
    1>tmpxft_00007bbc_00000000-7_operation_profiler.compute_90a.cudafe1.cpp
    1>conv3d_operation_profiler.cu
    1>tmpxft_0000dc18_00000000-7_conv3d_operation_profiler.compute_90a.cudafe1.cpp
    1>sparse_gemm_operation_profiler.cu
    1>tmpxft_0000d9ec_00000000-7_sparse_gemm_operation_profiler.compute_90a.cudafe1.cpp
    1>device_allocation.cu
    1>tmpxft_0000e67c_00000000-7_device_allocation.compute_90a.cudafe1.cpp
    1>main.cpp
    1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\cuda.h(20247,1): warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss
    1>performance_report.cpp
    1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\cuda.h(20247,1): warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss
    1>enumerated_types.cpp
    1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\cuda.h(20247,1): warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss
    1>gpu_timer.cpp
    1>cudnn_helpers.cpp
    1>problem_space.cpp
    1>C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\cuda.h(20247,1): warning C4819: The file contains a character that cannot be represented in the current code page (936). Save the file in Unicode format to prevent data loss
    1>Generating Code...
    1>LINK : fatal error LNK1104: cannot open file '..\library\Debug\cutlass.debug.lib'
    1>Done building project "cutlass_profiler.vcxproj" -- FAILED.
    ========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========
    ========== Rebuild started at 12:22 PM and took 03:41.166 minutes ==========

    I know cutlass plan to add Windows (MSVC) & Clang compiler support soon.
    but could we have some method to compile and build cutlass profiler for now? Thanks for your contibution and hard work.

hwu36 commented 1 year ago

cutlass 2.x should support msvc well. if you only need to run kernels on arch other than hopper, you should be good.