facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Apache License 2.0
4.55k stars 367 forks source link

complie controlnet error #949

Closed dushwe closed 12 months ago

dushwe commented 1 year ago

in the same dokcer env complie clip/unte/vae sucess! but compile controlnet error

env: V100 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) GNU Make 4.1 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

截屏2023-10-11 下午2 53 21

package


aitemplate 0.3.dev0 alabaster 0.7.13 amqp 5.1.1 apeye 1.4.1 apeye-core 1.1.4 astroid 2.11.7 attrs 23.1.0 autodocsumm 0.2.11 Babel 2.13.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.12.2 billiard 3.6.4.0 black 23.9.1 CacheControl 0.13.1 celery 5.1.2 certifi 2023.7.22 cffi 1.16.0 charset-normalizer 3.3.0 click 8.1.7 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.3.0 cryptography 41.0.4 cssutils 2.7.1 cuda-python 11.7.0 Cython 3.0.3 Deprecated 1.2.14 dict2css 0.3.0 diffusers 0.21.4 dill 0.3.7 docutils 0.18.1 domdf-python-tools 3.6.1 einops 0.7.0 exceptiongroup 1.1.3 filelock 3.12.4 fsspec 2023.9.2 gitdb 4.0.10 GitPython 3.1.37 hflow 1.3.0 html5lib 1.1 huggingface-hub 0.17.3 idna 3.4 imagesize 1.4.1 importlib-metadata 4.13.0 iniconfig 2.0.0 isort 5.12.0 Jinja2 3.1.2 kombu 5.3.2 lazy-object-proxy 1.9.0 libcst 1.1.0 MarkupSafe 2.1.3 mccabe 0.7.0 moreorless 0.4.0 mpmath 1.3.0 msgpack 1.0.7 mypy-extensions 1.0.0 natsort 8.4.0 numpy 1.24.4 opencv-python 4.2.0.32 packaging 23.2 parameterized 0.9.0 pathspec 0.11.2 Pillow 10.0.1 pip 23.2.1 platformdirs 3.11.0 pluggy 1.3.0 prompt-toolkit 3.0.39 pycparser 2.21 PyGithub 2.1.1 Pygments 2.16.1 PyJWT 2.8.0 pylint 2.13.9 PyNaCl 1.5.0 pytest 7.4.2 python-dateutil 2.8.2 pytz 2023.3.post1 PyYAML 6.0.1 rabbitmq 0.2.0 redis 3.5.3 regex 2023.10.3 requests 2.31.0 retrying 1.3.3 ruamel.yaml 0.17.35 ruamel.yaml.clib 0.2.8 safetensors 0.4.0 setuptools 56.0.0 six 1.16.0 smmap 5.0.1 snowballstemmer 2.2.0 soupsieve 2.5 Sphinx 7.1.2 sphinx-autodoc-typehints 1.24.0 sphinx-gallery 0.14.0 sphinx-jinja2-compat 0.2.0 sphinx-prompt 1.7.0 sphinx-rtd-theme 1.3.0 sphinx-tabs 3.4.1 sphinx-toolbox 3.5.0 sphinxcontrib-applehelp 1.0.4 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.1 sphinxcontrib-inlinesyntaxhighlight 0.2 sphinxcontrib-jquery 4.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 stdlibs 2022.10.9 sympy 1.12 tabulate 0.9.0 timm 0.9.7 tokenizers 0.14.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.1 torch 1.13.0+cu117 torchaudio 0.13.0+cu117 torchvision 0.14.0+cu117 tqdm 4.66.1 trailrunner 1.4.0 transformers 4.34.0 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 ufmt 2.2.0 urllib3 2.0.6 usort 1.0.7 vine 5.0.0 wcwidth 0.2.8 webencodings 0.5.1 wheel 0.38.4 wrapt 1.15.0 xmltodict 0.13.0 yacs 0.1.8 zipp 3.17.0

-- part error log----

82 errors detected in the compilation of "conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.cu". make: *** [conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.obj] Error 255 /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_oppolicy.h(58): error: incomplete type is not allowed detected during: instantiation of class "cutlass::gemm::warp::MmaTensorOpPolicy<Operator, OpDelta> [with Operator=cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::halft, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, OpDelta=cutlass::MatrixShape<1, 1>]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensorop.h(194): here instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape, ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, Policy, PartitionsK, AccumulatorsInRowMajor, Enable> [with Shape=cutlass::gemm::GemmShape<64, 64, 32>, ElementA=cutlass::halft, LayoutA=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::halft, LayoutB=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::halft, LayoutC=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::halft, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK=1, AccumulatorsInRowMajor=false, Enable=__nv_bool]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensorop.h(494): here instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, __nvbool>, PartitionsK=1, OutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]" conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here

/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensorop.h(197): error: incomplete type is not allowed detected during: instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape, ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, Policy, PartitionsK, AccumulatorsInRowMajor, Enable> [with Shape=cutlass::gemm::GemmShape<64, 64, 32>, ElementA=cutlass::halft, LayoutA=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::halft, LayoutB=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::halft, LayoutC=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::halft, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK=1, AccumulatorsInRowMajor=false, Enable=__nv_bool]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensorop.h(494): here instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, __nvbool>, PartitionsK=1, OutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue::thread::ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]" conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here

Traceback (most recent call last): File "./scripts/compile_controlnet.py", line 86, in compile_diffusers() File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "./scripts/compile_controlnet.py", line 73, in compile_diffusers compile_controlnet( File "/www/server/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_controlnet.py", line 117, in compile_controlnet compile_model( File "/usr/local/lib/python3.8/site-packages/aitemplate/utils/misc.py", line 93, in inner_function return f(args, **kwargs) File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/compiler.py", line 276, in compile_model compiler.transform.profile( File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/transform/profile.py", line 92, in profile compile_engine.make_profilers(generated_profilers, profiler_dir) File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 835, in make_profilers _run_make_cmds( File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 183, in _run_make_cmds raise RuntimeError("Build has failed.") RuntimeError: Build has failed.

log file: controlnet_log.txt

how to solve this problem ?

dushwe commented 1 year ago

in the same dokcer env complie clip/unte/vae sucess! but compile controlnet error

env: V100 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) GNU Make 4.1 nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_Mar__8_18:18:20_PST_2022 Cuda compilation tools, release 11.6, V11.6.124 Build cuda_11.6.r11.6/compiler.31057947_0

截屏2023-10-11 下午2 53 21

package

aitemplate 0.3.dev0 alabaster 0.7.13 amqp 5.1.1 apeye 1.4.1 apeye-core 1.1.4 astroid 2.11.7 attrs 23.1.0 autodocsumm 0.2.11 Babel 2.13.0 backports.zoneinfo 0.2.1 beautifulsoup4 4.12.2 billiard 3.6.4.0 black 23.9.1 CacheControl 0.13.1 celery 5.1.2 certifi 2023.7.22 cffi 1.16.0 charset-normalizer 3.3.0 click 8.1.7 click-didyoumean 0.3.0 click-plugins 1.1.1 click-repl 0.3.0 cryptography 41.0.4 cssutils 2.7.1 cuda-python 11.7.0 Cython 3.0.3 Deprecated 1.2.14 dict2css 0.3.0 diffusers 0.21.4 dill 0.3.7 docutils 0.18.1 domdf-python-tools 3.6.1 einops 0.7.0 exceptiongroup 1.1.3 filelock 3.12.4 fsspec 2023.9.2 gitdb 4.0.10 GitPython 3.1.37 hflow 1.3.0 html5lib 1.1 huggingface-hub 0.17.3 idna 3.4 imagesize 1.4.1 importlib-metadata 4.13.0 iniconfig 2.0.0 isort 5.12.0 Jinja2 3.1.2 kombu 5.3.2 lazy-object-proxy 1.9.0 libcst 1.1.0 MarkupSafe 2.1.3 mccabe 0.7.0 moreorless 0.4.0 mpmath 1.3.0 msgpack 1.0.7 mypy-extensions 1.0.0 natsort 8.4.0 numpy 1.24.4 opencv-python 4.2.0.32 packaging 23.2 parameterized 0.9.0 pathspec 0.11.2 Pillow 10.0.1 pip 23.2.1 platformdirs 3.11.0 pluggy 1.3.0 prompt-toolkit 3.0.39 pycparser 2.21 PyGithub 2.1.1 Pygments 2.16.1 PyJWT 2.8.0 pylint 2.13.9 PyNaCl 1.5.0 pytest 7.4.2 python-dateutil 2.8.2 pytz 2023.3.post1 PyYAML 6.0.1 rabbitmq 0.2.0 redis 3.5.3 regex 2023.10.3 requests 2.31.0 retrying 1.3.3 ruamel.yaml 0.17.35 ruamel.yaml.clib 0.2.8 safetensors 0.4.0 setuptools 56.0.0 six 1.16.0 smmap 5.0.1 snowballstemmer 2.2.0 soupsieve 2.5 Sphinx 7.1.2 sphinx-autodoc-typehints 1.24.0 sphinx-gallery 0.14.0 sphinx-jinja2-compat 0.2.0 sphinx-prompt 1.7.0 sphinx-rtd-theme 1.3.0 sphinx-tabs 3.4.1 sphinx-toolbox 3.5.0 sphinxcontrib-applehelp 1.0.4 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.1 sphinxcontrib-inlinesyntaxhighlight 0.2 sphinxcontrib-jquery 4.1 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 stdlibs 2022.10.9 sympy 1.12 tabulate 0.9.0 timm 0.9.7 tokenizers 0.14.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.1 torch 1.13.0+cu117 torchaudio 0.13.0+cu117 torchvision 0.14.0+cu117 tqdm 4.66.1 trailrunner 1.4.0 transformers 4.34.0 typing_extensions 4.8.0 typing-inspect 0.9.0 tzdata 2023.3 ufmt 2.2.0 urllib3 2.0.6 usort 1.0.7 vine 5.0.0 wcwidth 0.2.8 webencodings 0.5.1 wheel 0.38.4 wrapt 1.15.0 xmltodict 0.13.0 yacs 0.1.8 zipp 3.17.0

-- part error log----

82 errors detected in the compilation of "conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.cu". make: *** [conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_64x64_32x3_nhwc_align_4_8.obj] Error 255 /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensor_oppolicy.h(58): error: incomplete type is not allowed detected during: instantiation of class "cutlass::gemm::warp::MmaTensorOpPolicy<Operator, OpDelta> [with Operator=cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::halft, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, OpDelta=cutlass::MatrixShape<1, 1>]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensorop.h(194): here instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape, ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, Policy, PartitionsK, AccumulatorsInRowMajor, Enable> [with Shape=cutlass::gemm::GemmShape<64, 64, 32>, ElementA=cutlass::halft, LayoutA=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::halft, LayoutB=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::halft, LayoutC=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::halft, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK=1, AccumulatorsInRowMajor=false, Enable=__nv_bool]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensorop.h(494): here instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, __nvbool>, PartitionsK=1, OutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]" conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here

/usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/gemm/warp/mma_tensorop.h(197): error: incomplete type is not allowed detected during: instantiation of class "cutlass::gemm::warp::MmaTensorOp<Shape, ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, Policy, PartitionsK, AccumulatorsInRowMajor, Enable> [with Shape=cutlass::gemm::GemmShape<64, 64, 32>, ElementA=cutlass::halft, LayoutA=cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, ElementB_=cutlass::halft, LayoutB=cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, ElementC_=cutlass::halft, LayoutC=cutlass::layout::RowMajor, Policy_=cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::halft, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, PartitionsK=1, AccumulatorsInRowMajor=false, Enable=__nv_bool]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_tensorop.h(494): here instantiation of class "cutlass::epilogue::threadblock::DefaultEpilogueTensorOp<Shape, WarpMmaTensorOp, PartitionsK, OutputOp, ElementsPerAccess, ScatterD, PermuteDLayout> [with Shape=cutlass::gemm::GemmShape<256, 128, 32>, WarpMmaTensorOp=cutlass::gemm::warp::MmaTensorOp<cutlass::gemm::GemmShape<64, 64, 32>, cutlass::half_t, cutlass::layout::RowMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::ColumnMajorTensorOpMultiplicandCrosswise<16, 32>, cutlass::half_t, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaTensorOpPolicy<cutlass::arch::Mma<cutlass::gemm::GemmShape<8, 8, 4>, 32, cutlass::half_t, cutlass::layout::RowMajor, cutlass::half_t, cutlass::layout::ColumnMajor, cutlass::half_t, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd>, cutlass::MatrixShape<1, 1>>, 1, false, __nvbool>, PartitionsK=1, OutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ElementsPerAccess=8, ScatterD=false, PermuteDLayout=cutlass::layout::NoPermute]" /usr/local/lib/python3.8/site-packages/aitemplate/3rdparty/cutlass/include/cutlass/conv/kernel/default_conv2d_fprop.h(319): here instantiation of class "cutlass::conv::kernel::DefaultConv2dFprop<ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, ArchTag, ThreadblockShape, WarpShape, InstructionShape, EpilogueOutputOp, ThreadblockSwizzle, Stages, MathOperatorTag, cutlass::conv::IteratorAlgorithm::kFixedChannels, StrideSupport, AlignmentA, AlignmentB> [with ElementA=cutlass::half_t, LayoutA=cutlass::layout::TensorNHWC, ElementB=cutlass::half_t, LayoutB=cutlass::layout::TensorNHWC, ElementC=cutlass::half_t, LayoutC=cutlass::layout::TensorNHWC, ElementAccumulator=cutlass::half_t, ArchTag=cutlass::arch::Sm70, ThreadblockShape=cutlass::gemm::GemmShape<256, 128, 32>, WarpShape=cutlass::gemm::GemmShape<64, 64, 32>, InstructionShape=cutlass::gemm::GemmShape<8, 8, 4>, EpilogueOutputOp=cutlass::epilogue:🧵:LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t, cutlass::epilogue:🧵:ScaleType::Default, cutlass::FloatRoundStyle::round_to_nearest, cutlass::half_t>, ThreadblockSwizzle=cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, Stages=3, MathOperatorTag=cutlass::arch::OpMultiplyAdd, StrideSupport=cutlass::conv::StrideSupport::kStrided, AlignmentA=4, AlignmentB=4]" conv2d_bias_few_channels/conv2d_bias_few_channels_cutlass_h884fprop_fixed_channels_256x128_32x3_nhwc_align_4_8.cu(64): here

Traceback (most recent call last): File "./scripts/compile_controlnet.py", line 86, in compile_diffusers() File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "./scripts/compile_controlnet.py", line 73, in compile_diffusers compile_controlnet( File "/www/server/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_controlnet.py", line 117, in compile_controlnet compile_model( File "/usr/local/lib/python3.8/site-packages/aitemplate/utils/misc.py", line 93, in inner_function return f(args, **kwargs) File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/compiler.py", line 276, in compile_model compiler.transform.profile( File "/usr/local/lib/python3.8/site-packages/aitemplate/compiler/transform/profile.py", line 92, in profile compile_engine.make_profilers(generated_profilers, profiler_dir) File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 835, in make_profilers _run_make_cmds( File "/usr/local/lib/python3.8/site-packages/aitemplate/backend/builder.py", line 183, in _run_make_cmds raise RuntimeError("Build has failed.") RuntimeError: Build has failed.

how to solve this problem ?

A100, the problem did not appear! how to solve

chenyang78 commented 1 year ago

Hi @dushwe Thank you for reporting the issue. We don't support Stable Diffusion on V100.