facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Apache License 2.0
4.55k stars 367 forks source link

Error when running compile_alt.py in stable diffusion example: list index out of range in conv2d #742

Open yit-b opened 1 year ago

yit-b commented 1 year ago

Repro steps:

<Run a container in daemon mode>
docker exec -it <container name> bash
cd AITemplate/examples/05_stable_diffusion
pip install accelerate
<run the download pipeline>
python3 scripts/compile_alt.py --local-dir tmp/diffusers-pipeline/stabilityai/stable-diffusion-v2/

Errors after a while with:

Traceback (most recent call last):
  File "scripts/compile_alt.py", line 138, in <module>
    compile_diffusers()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "scripts/compile_alt.py", line 126, in compile_diffusers
    compile_vae(
  File "/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_vae_alt.py", line 185, in compile_vae
    compile_model(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/utils/misc.py", line 93, in inner_function
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/compiler.py", line 275, in compile_model
    compiler.transform.profile(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/transform/profile.py", line 103, in profile
    f.profile(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/conv/conv2d.py", line 601, in profile
    self._profile_dynamic_dim(workdir)
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/conv/conv2d.py", line 721, in _profile_dynamic_dim
    result_dict = {res.op_config: res for res in result[0][1]}
IndexError: list index out of range
yit-b commented 1 year ago

Here's a nice chunk of debug logs leading up to the crash:

2023-06-01 03:45:22,433 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,433 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,539 DEBUG <aitemplate.utils.graph_utils> Dumped 10-transform_memory_ops graph to ./tmp/AutoencoderKL/10-transform_memory_ops_graph.txt
2023-06-01 03:45:22,570 DEBUG <aitemplate.utils.graph_utils> Dumped 10-transform_memory_ops graph to ./tmp/AutoencoderKL/10-transform_memory_ops_graph.json
2023-06-01 03:45:22,571 DEBUG <aitemplate.utils.graph_utils> Dumped 10-transform_memory_ops pseudo code to ./tmp/AutoencoderKL/10-transform_memory_ops_pseudo_code.txt
2023-06-01 03:45:22,572 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,572 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,840 DEBUG <aitemplate.utils.graph_utils> Dumped 10-transform_memory_ops visualization to ./tmp/AutoencoderKL/10-transform_memory_ops_graph_vis.html
2023-06-01 03:45:22,841 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,841 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,952 DEBUG <aitemplate.utils.graph_utils> Dumped 11-fuse_ops graph to ./tmp/AutoencoderKL/11-fuse_ops_graph.txt
2023-06-01 03:45:22,974 DEBUG <aitemplate.utils.graph_utils> Dumped 11-fuse_ops graph to ./tmp/AutoencoderKL/11-fuse_ops_graph.json
2023-06-01 03:45:22,975 DEBUG <aitemplate.utils.graph_utils> Dumped 11-fuse_ops pseudo code to ./tmp/AutoencoderKL/11-fuse_ops_pseudo_code.txt
2023-06-01 03:45:22,977 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:22,977 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:23,158 DEBUG <aitemplate.utils.graph_utils> Dumped 11-fuse_ops visualization to ./tmp/AutoencoderKL/11-fuse_ops_graph_vis.html
2023-06-01 03:45:23,159 DEBUG <aitemplate.compiler.transform.fuse_ops> original op set: {{ 'depth': 15,
  'has_profiler': False,
  'inputs': ['gemm_rcr_bias_add_20_0'],
  'is_intvar': False,
  'name': 'reshape_21',
  'nop': False,
  'op': 'reshape',
  'original_name': 'reshape_21',
  'outputs': ['reshape_21_0'],
  'shape': [ { 'depth': 0,
  'name': 'batch_size',
  'nop': False,
  'symbolic_value': batch_size,
  'values': [1, 4]},
             {'depth': 0, 'name': None, 'nop': False, 'symbolic_value': -1, 'values': [-1]},
             {'depth': 0, 'name': None, 'nop': False, 'symbolic_value': 512, 'values': [512]}],
  'unknown_idx': 1}}, to_be_removed_set: {{ 'depth': 15,
  'has_profiler': False,
  'inputs': ['gemm_rcr_bias_add_20_0'],
  'is_intvar': False,
  'name': 'reshape_21',
  'nop': False,
  'op': 'reshape',
  'original_name': 'reshape_21',
  'outputs': ['reshape_21_0'],
  'shape': [ { 'depth': 0,
  'name': 'batch_size',
  'nop': False,
  'symbolic_value': batch_size,
  'values': [1, 4]},
             {'depth': 0, 'name': None, 'nop': False, 'symbolic_value': -1, 'values': [-1]},
             {'depth': 0, 'name': None, 'nop': False, 'symbolic_value': 512, 'values': [512]}],
  'unknown_idx': 1}}, final_set: set()
2023-06-01 03:45:23,161 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=113, tensor_cnt=0, len(func_name_to_tensor_cnt)=113, len(user_provided_dim)=3347
2023-06-01 03:45:23,161 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,273 DEBUG <aitemplate.utils.graph_utils> Dumped 12-fuse_elementwise graph to ./tmp/AutoencoderKL/12-fuse_elementwise_graph.txt
2023-06-01 03:45:23,294 DEBUG <aitemplate.utils.graph_utils> Dumped 12-fuse_elementwise graph to ./tmp/AutoencoderKL/12-fuse_elementwise_graph.json
2023-06-01 03:45:23,295 DEBUG <aitemplate.utils.graph_utils> Dumped 12-fuse_elementwise pseudo code to ./tmp/AutoencoderKL/12-fuse_elementwise_pseudo_code.txt
2023-06-01 03:45:23,297 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,297 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,475 DEBUG <aitemplate.utils.graph_utils> Dumped 12-fuse_elementwise visualization to ./tmp/AutoencoderKL/12-fuse_elementwise_graph_vis.html
2023-06-01 03:45:23,477 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,477 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,479 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,479 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,593 DEBUG <aitemplate.utils.graph_utils> Dumped 13-fuse_parallel_gemms graph to ./tmp/AutoencoderKL/13-fuse_parallel_gemms_graph.txt
2023-06-01 03:45:23,614 DEBUG <aitemplate.utils.graph_utils> Dumped 13-fuse_parallel_gemms graph to ./tmp/AutoencoderKL/13-fuse_parallel_gemms_graph.json
2023-06-01 03:45:23,615 DEBUG <aitemplate.utils.graph_utils> Dumped 13-fuse_parallel_gemms pseudo code to ./tmp/AutoencoderKL/13-fuse_parallel_gemms_pseudo_code.txt
2023-06-01 03:45:23,617 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,617 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,796 DEBUG <aitemplate.utils.graph_utils> Dumped 13-fuse_parallel_gemms visualization to ./tmp/AutoencoderKL/13-fuse_parallel_gemms_graph_vis.html
2023-06-01 03:45:23,910 DEBUG <aitemplate.utils.graph_utils> Dumped 14-fuse_group_ops graph to ./tmp/AutoencoderKL/14-fuse_group_ops_graph.txt
2023-06-01 03:45:23,932 DEBUG <aitemplate.utils.graph_utils> Dumped 14-fuse_group_ops graph to ./tmp/AutoencoderKL/14-fuse_group_ops_graph.json
2023-06-01 03:45:23,933 DEBUG <aitemplate.utils.graph_utils> Dumped 14-fuse_group_ops pseudo code to ./tmp/AutoencoderKL/14-fuse_group_ops_pseudo_code.txt
2023-06-01 03:45:23,934 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:23,935 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,104 DEBUG <aitemplate.utils.graph_utils> Dumped 14-fuse_group_ops visualization to ./tmp/AutoencoderKL/14-fuse_group_ops_graph_vis.html
2023-06-01 03:45:24,213 DEBUG <aitemplate.utils.graph_utils> Dumped 15-transform_special_ops graph to ./tmp/AutoencoderKL/15-transform_special_ops_graph.txt
2023-06-01 03:45:24,233 DEBUG <aitemplate.utils.graph_utils> Dumped 15-transform_special_ops graph to ./tmp/AutoencoderKL/15-transform_special_ops_graph.json
2023-06-01 03:45:24,234 DEBUG <aitemplate.utils.graph_utils> Dumped 15-transform_special_ops pseudo code to ./tmp/AutoencoderKL/15-transform_special_ops_pseudo_code.txt
2023-06-01 03:45:24,236 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,236 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,486 DEBUG <aitemplate.utils.graph_utils> Dumped 15-transform_special_ops visualization to ./tmp/AutoencoderKL/15-transform_special_ops_graph_vis.html
2023-06-01 03:45:24,486 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,487 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,601 DEBUG <aitemplate.utils.graph_utils> Dumped 16-apply_padding graph to ./tmp/AutoencoderKL/16-apply_padding_graph.txt
2023-06-01 03:45:24,622 DEBUG <aitemplate.utils.graph_utils> Dumped 16-apply_padding graph to ./tmp/AutoencoderKL/16-apply_padding_graph.json
2023-06-01 03:45:24,623 DEBUG <aitemplate.utils.graph_utils> Dumped 16-apply_padding pseudo code to ./tmp/AutoencoderKL/16-apply_padding_pseudo_code.txt
2023-06-01 03:45:24,625 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,625 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,804 DEBUG <aitemplate.utils.graph_utils> Dumped 16-apply_padding visualization to ./tmp/AutoencoderKL/16-apply_padding_graph_vis.html
2023-06-01 03:45:24,805 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,805 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,916 DEBUG <aitemplate.utils.graph_utils> Dumped 17-move_view_op_before_concat graph to ./tmp/AutoencoderKL/17-move_view_op_before_concat_graph.txt
2023-06-01 03:45:24,938 DEBUG <aitemplate.utils.graph_utils> Dumped 17-move_view_op_before_concat graph to ./tmp/AutoencoderKL/17-move_view_op_before_concat_graph.json
2023-06-01 03:45:24,939 DEBUG <aitemplate.utils.graph_utils> Dumped 17-move_view_op_before_concat pseudo code to ./tmp/AutoencoderKL/17-move_view_op_before_concat_pseudo_code.txt
2023-06-01 03:45:24,940 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:24,941 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,112 DEBUG <aitemplate.utils.graph_utils> Dumped 17-move_view_op_before_concat visualization to ./tmp/AutoencoderKL/17-move_view_op_before_concat_graph_vis.html
2023-06-01 03:45:25,112 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,113 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,113 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,114 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,114 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,114 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,115 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,115 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,116 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,116 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,116 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,117 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,117 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,117 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,118 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,118 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,119 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,119 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,120 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,120 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,231 DEBUG <aitemplate.utils.graph_utils> Dumped 18-transform_memory_ops graph to ./tmp/AutoencoderKL/18-transform_memory_ops_graph.txt
2023-06-01 03:45:25,251 DEBUG <aitemplate.utils.graph_utils> Dumped 18-transform_memory_ops graph to ./tmp/AutoencoderKL/18-transform_memory_ops_graph.json
2023-06-01 03:45:25,252 DEBUG <aitemplate.utils.graph_utils> Dumped 18-transform_memory_ops pseudo code to ./tmp/AutoencoderKL/18-transform_memory_ops_pseudo_code.txt
2023-06-01 03:45:25,254 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,254 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,505 DEBUG <aitemplate.utils.graph_utils> Dumped 18-transform_memory_ops visualization to ./tmp/AutoencoderKL/18-transform_memory_ops_graph_vis.html
2023-06-01 03:45:25,505 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,506 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,506 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,507 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,507 DEBUG <aitemplate.compiler.tensor_accessor> generate dim_mapping: [([0], [0]), ([1], [1]), ([], [2]), ([2], [3])]
2023-06-01 03:45:25,507 DEBUG <aitemplate.compiler.tensor_accessor> generate dim_mapping: [([0], [0]), ([1], [1]), ([], [2]), ([2], [3])]
2023-06-01 03:45:25,507 DEBUG <aitemplate.compiler.tensor_accessor> generate dim_mapping: [([0], [0]), ([1], [1]), ([], [2]), ([2], [3])]
2023-06-01 03:45:25,507 DEBUG <aitemplate.compiler.tensor_accessor> generate dim_mapping: [([0], [0]), ([1], [1]), ([], [2]), ([2], [3])]
2023-06-01 03:45:25,507 DEBUG <aitemplate.compiler.tensor_accessor> generate dim_mapping: [([0], [0]), ([1], [1]), ([2], [2])]
2023-06-01 03:45:25,508 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,508 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,508 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,509 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,509 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,510 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,510 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,510 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,511 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,511 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,512 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,512 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,512 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,512 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,626 DEBUG <aitemplate.utils.graph_utils> Dumped 19-transform_strided_ops graph to ./tmp/AutoencoderKL/19-transform_strided_ops_graph.txt
2023-06-01 03:45:25,647 DEBUG <aitemplate.utils.graph_utils> Dumped 19-transform_strided_ops graph to ./tmp/AutoencoderKL/19-transform_strided_ops_graph.json
2023-06-01 03:45:25,648 DEBUG <aitemplate.utils.graph_utils> Dumped 19-transform_strided_ops pseudo code to ./tmp/AutoencoderKL/19-transform_strided_ops_pseudo_code.txt
2023-06-01 03:45:25,649 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,649 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,828 DEBUG <aitemplate.utils.graph_utils> Dumped 19-transform_strided_ops visualization to ./tmp/AutoencoderKL/19-transform_strided_ops_graph_vis.html
2023-06-01 03:45:25,829 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,829 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,940 DEBUG <aitemplate.utils.graph_utils> Dumped 20-split_large_slice_scatter_ops graph to ./tmp/AutoencoderKL/20-split_large_slice_scatter_ops_graph.txt
2023-06-01 03:45:25,961 DEBUG <aitemplate.utils.graph_utils> Dumped 20-split_large_slice_scatter_ops graph to ./tmp/AutoencoderKL/20-split_large_slice_scatter_ops_graph.json
2023-06-01 03:45:25,962 DEBUG <aitemplate.utils.graph_utils> Dumped 20-split_large_slice_scatter_ops pseudo code to ./tmp/AutoencoderKL/20-split_large_slice_scatter_ops_pseudo_code.txt
2023-06-01 03:45:25,964 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:25,964 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,134 DEBUG <aitemplate.utils.graph_utils> Dumped 20-split_large_slice_scatter_ops visualization to ./tmp/AutoencoderKL/20-split_large_slice_scatter_ops_graph_vis.html
2023-06-01 03:45:26,134 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,135 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,245 DEBUG <aitemplate.utils.graph_utils> Dumped 21-split_large_concat_ops graph to ./tmp/AutoencoderKL/21-split_large_concat_ops_graph.txt
2023-06-01 03:45:26,267 DEBUG <aitemplate.utils.graph_utils> Dumped 21-split_large_concat_ops graph to ./tmp/AutoencoderKL/21-split_large_concat_ops_graph.json
2023-06-01 03:45:26,268 DEBUG <aitemplate.utils.graph_utils> Dumped 21-split_large_concat_ops pseudo code to ./tmp/AutoencoderKL/21-split_large_concat_ops_pseudo_code.txt
2023-06-01 03:45:26,269 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,270 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,435 DEBUG <aitemplate.utils.graph_utils> Dumped 21-split_large_concat_ops visualization to ./tmp/AutoencoderKL/21-split_large_concat_ops_graph_vis.html
2023-06-01 03:45:26,547 DEBUG <aitemplate.utils.graph_utils> Dumped 22-split_large_split_ops graph to ./tmp/AutoencoderKL/22-split_large_split_ops_graph.txt
2023-06-01 03:45:26,568 DEBUG <aitemplate.utils.graph_utils> Dumped 22-split_large_split_ops graph to ./tmp/AutoencoderKL/22-split_large_split_ops_graph.json
2023-06-01 03:45:26,569 DEBUG <aitemplate.utils.graph_utils> Dumped 22-split_large_split_ops pseudo code to ./tmp/AutoencoderKL/22-split_large_split_ops_pseudo_code.txt
2023-06-01 03:45:26,570 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,571 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,824 DEBUG <aitemplate.utils.graph_utils> Dumped 22-split_large_split_ops visualization to ./tmp/AutoencoderKL/22-split_large_split_ops_graph_vis.html
2023-06-01 03:45:26,826 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=114, tensor_cnt=0, len(func_name_to_tensor_cnt)=114, len(user_provided_dim)=3347
2023-06-01 03:45:26,826 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:26,940 DEBUG <aitemplate.utils.graph_utils> Dumped 23-transform_permute_to_reshape graph to ./tmp/AutoencoderKL/23-transform_permute_to_reshape_graph.txt
2023-06-01 03:45:26,961 DEBUG <aitemplate.utils.graph_utils> Dumped 23-transform_permute_to_reshape graph to ./tmp/AutoencoderKL/23-transform_permute_to_reshape_graph.json
2023-06-01 03:45:26,963 DEBUG <aitemplate.utils.graph_utils> Dumped 23-transform_permute_to_reshape pseudo code to ./tmp/AutoencoderKL/23-transform_permute_to_reshape_pseudo_code.txt
2023-06-01 03:45:26,964 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:26,964 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,138 DEBUG <aitemplate.utils.graph_utils> Dumped 23-transform_permute_to_reshape visualization to ./tmp/AutoencoderKL/23-transform_permute_to_reshape_graph_vis.html
2023-06-01 03:45:27,138 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,138 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,139 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,139 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,140 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,140 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,141 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,141 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,142 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,142 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,142 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,143 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,143 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,143 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,144 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,144 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,145 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,145 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,146 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,146 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,259 DEBUG <aitemplate.utils.graph_utils> Dumped 24-transform_memory_ops graph to ./tmp/AutoencoderKL/24-transform_memory_ops_graph.txt
2023-06-01 03:45:27,281 DEBUG <aitemplate.utils.graph_utils> Dumped 24-transform_memory_ops graph to ./tmp/AutoencoderKL/24-transform_memory_ops_graph.json
2023-06-01 03:45:27,282 DEBUG <aitemplate.utils.graph_utils> Dumped 24-transform_memory_ops pseudo code to ./tmp/AutoencoderKL/24-transform_memory_ops_pseudo_code.txt
2023-06-01 03:45:27,283 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,283 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,455 DEBUG <aitemplate.utils.graph_utils> Dumped 24-transform_memory_ops visualization to ./tmp/AutoencoderKL/24-transform_memory_ops_graph_vis.html
2023-06-01 03:45:27,569 DEBUG <aitemplate.utils.graph_utils> Dumped optimize_graph graph to ./tmp/AutoencoderKL/optimize_graph_graph.txt
2023-06-01 03:45:27,600 DEBUG <aitemplate.utils.graph_utils> Dumped optimize_graph graph to ./tmp/AutoencoderKL/optimize_graph_graph.json
2023-06-01 03:45:27,601 DEBUG <aitemplate.utils.graph_utils> Dumped optimize_graph pseudo code to ./tmp/AutoencoderKL/optimize_graph_pseudo_code.txt
2023-06-01 03:45:27,603 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,603 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,778 DEBUG <aitemplate.utils.graph_utils> Dumped optimize_graph visualization to ./tmp/AutoencoderKL/optimize_graph_graph_vis.html
2023-06-01 03:45:27,778 INFO <aitemplate.compiler.compiler> optimized graph elapsed time: 0:00:08.833879
2023-06-01 03:45:27,779 DEBUG <aitemplate.compiler.transform.refine_graph> refined ops: {'groupnorm_swish', 'conv2d_bias', 'conv2d_bias_add_identity', 'gemm_rcr_bias', 'reshape'}
2023-06-01 03:45:27,779 INFO <aitemplate.compiler.transform.refine_graph> reduced unique ops from 80 to 35
2023-06-01 03:45:27,896 DEBUG <aitemplate.utils.graph_utils> Dumped refine_graph graph to ./tmp/AutoencoderKL/refine_graph_graph.txt
2023-06-01 03:45:27,917 DEBUG <aitemplate.utils.graph_utils> Dumped refine_graph graph to ./tmp/AutoencoderKL/refine_graph_graph.json
2023-06-01 03:45:27,918 DEBUG <aitemplate.utils.graph_utils> Dumped refine_graph pseudo code to ./tmp/AutoencoderKL/refine_graph_pseudo_code.txt
2023-06-01 03:45:27,920 DEBUG <aitemplate.compiler.transform.name_graph> before name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:27,920 DEBUG <aitemplate.compiler.transform.name_graph> after name_graph: func_cnt=117, tensor_cnt=0, len(func_name_to_tensor_cnt)=117, len(user_provided_dim)=3347
2023-06-01 03:45:28,174 DEBUG <aitemplate.utils.graph_utils> Dumped refine_graph visualization to ./tmp/AutoencoderKL/refine_graph_graph_vis.html
2023-06-01 03:45:28,218 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,218 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_ac9dd22549f615b1d4d9bf9623d9c2c9d9a52221_3'
2023-06-01 03:45:28,264 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,264 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_ee160a5d56d7208ae1280de4691c2442b2dc3de0_3'
2023-06-01 03:45:28,311 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,311 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:28,359 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,359 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:28,420 DEBUG <aitemplate.compiler.ops.gemm_universal.gemm_common> Filtered profiler kernels for gemm_rcr_bias: reduced the number of generated kernels from 264 to 22
2023-06-01 03:45:28,420 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,420 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> generating profiler_filename='gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3'
2023-06-01 03:45:28,481 DEBUG <aitemplate.compiler.ops.gemm_universal.gemm_common> Filtered profiler kernels for gemm_rcr_bias: reduced the number of generated kernels from 264 to 22
2023-06-01 03:45:28,481 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,481 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> generating profiler_filename='gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3'
2023-06-01 03:45:28,538 DEBUG <aitemplate.compiler.ops.gemm_universal.gemm_common> Filtered profiler kernels for gemm_rcr_bias: reduced the number of generated kernels from 264 to 22
2023-06-01 03:45:28,538 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,538 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> generating profiler_filename='gemm_rcr_bias_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3'
2023-06-01 03:45:28,598 DEBUG <aitemplate.compiler.ops.gemm_universal.gemm_common> Filtered profiler kernels for gemm_rcr_bias_add: reduced the number of generated kernels from 264 to 22
2023-06-01 03:45:28,598 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,599 INFO <aitemplate.compiler.ops.gemm_universal.gemm_common> generating profiler_filename='gemm_rcr_bias_add_9e46850d5286ecc7e078b5b7f76afbcac62967b4_3'
2023-06-01 03:45:28,645 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,645 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:28,696 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,696 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:28,857 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,857 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:28,906 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,906 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:28,952 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:28,952 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,002 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,003 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,048 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,049 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,097 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,097 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,142 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,142 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,189 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,190 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,235 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,236 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,282 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,283 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,330 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,331 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,457 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,458 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,506 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,507 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,551 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,552 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,600 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,600 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,647 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,648 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,697 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,698 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,744 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,745 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,794 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,795 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,839 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,840 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,889 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,889 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:29,935 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:29,936 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:30,062 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:30,062 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:30,108 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:30,109 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:30,156 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:30,156 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:31,544 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,544 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:31,594 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,595 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:31,641 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,641 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:31,688 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,689 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3'
2023-06-01 03:45:31,735 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,735 INFO <aitemplate.compiler.ops.conv.conv_common> generating profiler_filename='conv2d_bias_b1e394865cd200dc04d975a6288c42ce6f64196c_3'
2023-06-01 03:45:31,739 INFO <aitemplate.compiler.transform.profile> generated 40 profilers elapsed time: 0:00:03.564456
2023-06-01 03:45:31,739 INFO <aitemplate.backend.builder> Using 32 CPU for building
2023-06-01 03:45:31,739 INFO <aitemplate.compiler.transform.profile> compiled profilers elapsed time: 0:00:00.000085
2023-06-01 03:45:31,739 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,739 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_0: NI == 1 && HI == 256 && WI == 256 && CI == 4
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_0: NI == 4 && HI == 256 && WI == 256 && CI == 4
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_1: NI == 1 && HI == 256 && WI == 256 && CI == 4
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_1: NI == 4 && HI == 256 && WI == 256 && CI == 4
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_3: NI == 1 && HI == 256 && WI == 256 && CI == 512
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_3: NI == 4 && HI == 256 && WI == 256 && CI == 512
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_99: NI == 1 && HI == 256 && WI == 256 && CI == 512
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_99: NI == 4 && HI == 256 && WI == 256 && CI == 512
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,740 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,740 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_45: NI == 1 && HI == 512 && WI == 512 && CI == 512
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_45: NI == 4 && HI == 512 && WI == 512 && CI == 512
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,741 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_47: NI == 1 && HI == 512 && WI == 512 && CI == 512
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_47: NI == 4 && HI == 512 && WI == 512 && CI == 512
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,741 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_104: NI == 1 && HI == 512 && WI == 512 && CI == 512
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_104: NI == 4 && HI == 512 && WI == 512 && CI == 512
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:31,741 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_add_identity_104
2023-06-01 03:45:31,741 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:45:40,677 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '92.9352', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '91.4616', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '99.1846', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '104.188', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '97.5964', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '93.9131', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '95.9293', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '96.5581', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '120.123', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '109.441', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '152.198', '0')]
2023-06-01 03:45:40,677 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8']
2023-06-01 03:45:40,677 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_add_identity_104][conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 92.9352 WS:0
2023-06-01 03:45:40,677 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_add_identity_104][conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8 TIME: 91.4616 WS:0
2023-06-01 03:45:40,678 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:45:40,678 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_62: NI == 1 && HI == 1024 && WI == 1024 && CI == 512
2023-06-01 03:45:40,678 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:40,678 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_62: NI == 4 && HI == 1024 && WI == 1024 && CI == 512
2023-06-01 03:45:40,678 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:45:40,678 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_62
2023-06-01 03:45:40,678 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:46:03,617 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '373.021', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '368.439', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '395.717', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '415.741', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '396.442', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '375.886', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '381.578', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '383.266', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '475.106', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '432.767', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '588', '0')]
2023-06-01 03:46:03,617 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:46:03,617 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_62][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 373.021 WS:0
2023-06-01 03:46:03,617 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_62][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8 TIME: 368.439 WS:0
2023-06-01 03:46:03,617 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:03,617 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_65: NI == 1 && HI == 1024 && WI == 1024 && CI == 512
2023-06-01 03:46:03,618 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:03,618 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_65: NI == 4 && HI == 1024 && WI == 1024 && CI == 512
2023-06-01 03:46:03,618 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:03,618 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_65
2023-06-01 03:46:03,618 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:46:21,658 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '184.007', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '183.081', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '198.084', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '209.284', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '196.718', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '186.429', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '190.032', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '191.786', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '237.167', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '217.059', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '296.009', '0')]
2023-06-01 03:46:21,658 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:46:21,658 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_65][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 184.007 WS:0
2023-06-01 03:46:21,658 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_65][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8 TIME: 183.081 WS:0
2023-06-01 03:46:21,659 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:21,659 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_67: NI == 1 && HI == 1024 && WI == 1024 && CI == 256
2023-06-01 03:46:21,659 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:21,659 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_67: NI == 4 && HI == 1024 && WI == 1024 && CI == 256
2023-06-01 03:46:21,659 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:21,659 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_67
2023-06-01 03:46:21,659 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:46:32,226 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '94.1732', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '90.2624', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '100.74', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '103.245', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '99.0741', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '92.9229', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '96.2345', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '97.4797', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '121.486', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '112.02', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '153.597', '0')]
2023-06-01 03:46:32,226 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8']
2023-06-01 03:46:32,226 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_67][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8 TIME: 90.2624 WS:0
2023-06-01 03:46:32,226 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_67][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8 TIME: 92.9229 WS:0
2023-06-01 03:46:32,226 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:32,226 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_107: NI == 1 && HI == 1024 && WI == 1024 && CI == 512
2023-06-01 03:46:32,227 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:32,227 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_107: NI == 4 && HI == 1024 && WI == 1024 && CI == 512
2023-06-01 03:46:32,227 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:32,227 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_add_identity_107
2023-06-01 03:46:32,227 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:46:54,777 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '27.8364', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '25.2662', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '31.3682', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '33.2585', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '30.5674', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '25.3553', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '30.1834', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '31.1142', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '47.6846', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '36.0765', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '59.5897', '0')]
2023-06-01 03:46:54,777 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:46:54,777 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_add_identity_107][conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 27.8364 WS:0
2023-06-01 03:46:54,777 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_add_identity_107][conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8 TIME: 25.3553 WS:0
2023-06-01 03:46:54,778 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:54,778 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_108: NI == 1 && HI == 1024 && WI == 1024 && CI == 256
2023-06-01 03:46:54,778 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,778 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_108: NI == 4 && HI == 1024 && WI == 1024 && CI == 256
2023-06-01 03:46:54,778 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,778 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:54,778 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_80: NI == 1 && HI == 2048 && WI == 2048 && CI == 256
2023-06-01 03:46:54,778 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,779 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_80: NI == 4 && HI == 2048 && WI == 2048 && CI == 256
2023-06-01 03:46:54,779 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,779 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_80
2023-06-01 03:46:54,779 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:46:54,880 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x256_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x64_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x64_64x5_nhwc_align_8_8', '0', '0')]
2023-06-01 03:46:54,880 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:46:54,880 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_80][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 0.0 WS:0
2023-06-01 03:46:54,880 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_80][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8 TIME: 0.0 WS:0
2023-06-01 03:46:54,880 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:54,880 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_83: NI == 1 && HI == 2048 && WI == 2048 && CI == 256
2023-06-01 03:46:54,881 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,881 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_83: NI == 4 && HI == 2048 && WI == 2048 && CI == 256
2023-06-01 03:46:54,881 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,881 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_83
2023-06-01 03:46:54,881 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:46:54,980 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x256_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x64_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x64_64x5_nhwc_align_8_8', '0', '0')]
2023-06-01 03:46:54,980 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:46:54,981 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_83][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 0.0 WS:0
2023-06-01 03:46:54,981 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_83][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8 TIME: 0.0 WS:0
2023-06-01 03:46:54,981 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:46:54,981 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_85: NI == 1 && HI == 2048 && WI == 2048 && CI == 128
2023-06-01 03:46:54,981 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,981 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_85: NI == 4 && HI == 2048 && WI == 2048 && CI == 128
2023-06-01 03:46:54,981 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:46:54,981 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_85
2023-06-01 03:46:54,981 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:47:17,528 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '98.5313', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '191.223', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '106.374', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '112.007', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '206.34', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '96.6308', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '101.584', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '102.11', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '131.653', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '119.661', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '170.688', '0')]
2023-06-01 03:47:17,528 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:47:17,528 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_85][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 98.5313 WS:0
2023-06-01 03:47:17,528 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_85][conv2d_bias_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8 TIME: 96.6308 WS:0
2023-06-01 03:47:17,528 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:47:17,528 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_110: NI == 1 && HI == 2048 && WI == 2048 && CI == 256
2023-06-01 03:47:17,529 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:47:17,529 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_110: NI == 4 && HI == 2048 && WI == 2048 && CI == 256
2023-06-01 03:47:17,529 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:47:17,529 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_add_identity_110
2023-06-01 03:47:17,529 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
2023-06-01 03:47:17,629 DEBUG <aitemplate.backend.profiler_runner> all runtimes (unsorted): [('cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x256_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_32x5_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x64_32x6_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x128_32x6_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x64_32x10_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x256_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_64x4_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_256x64_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x256_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_128x64_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x128_64x3_nhwc_align_8_8', '0', '0'), ('cutlass_h16816fprop_optimized_64x64_64x5_nhwc_align_8_8', '0', '0')]
2023-06-01 03:47:17,629 DEBUG <aitemplate.backend.profiler_runner> return ops: ['cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8', 'cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8']
2023-06-01 03:47:17,629 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_add_identity_110][conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_256x128_32x3_nhwc_align_8_8 TIME: 0.0 WS:0
2023-06-01 03:47:17,629 DEBUG <aitemplate.backend.profiler_runner> Successful: [conv2d_bias_add_identity_110][conv2d_bias_add_identity_433da49a14b3f2b9721875f8f077fac5b46b7b61_3]: OP: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8 TIME: 0.0 WS:0
2023-06-01 03:47:17,629 INFO <aitemplate.utils.environ> force_cache=False
2023-06-01 03:47:17,629 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_111: NI == 1 && HI == 2048 && WI == 2048 && CI == 128
2023-06-01 03:47:17,630 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:47:17,630 INFO <aitemplate.compiler.ops.conv.conv2d> Profile: conv2d_bias_add_identity_111: NI == 4 && HI == 2048 && WI == 2048 && CI == 128
2023-06-01 03:47:17,630 INFO <aitemplate.compiler.ops.conv.conv2d> Load profiling result from cache.
2023-06-01 03:47:17,630 INFO <aitemplate.backend.profiler_runner> Using 1 GPU for profiling conv2d_bias_add_identity_111
2023-06-01 03:47:17,630 INFO <aitemplate.compiler.ops.conv.conv2d> current: lb_algo: cutlass_h16816fprop_optimized_128x128_32x3_nhwc_align_8_8, LB:1 MID:2 UB:4
Traceback (most recent call last):
  File "scripts/compile_alt.py", line 138, in <module>
    compile_diffusers()
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "scripts/compile_alt.py", line 126, in compile_diffusers
    compile_vae(
  File "/AITemplate/examples/05_stable_diffusion/src/compile_lib/compile_vae_alt.py", line 185, in compile_vae
    compile_model(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/utils/misc.py", line 93, in inner_function
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/compiler.py", line 275, in compile_model
    compiler.transform.profile(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/transform/profile.py", line 103, in profile
    f.profile(
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/conv/conv2d.py", line 601, in profile
    self._profile_dynamic_dim(workdir)
  File "/usr/local/lib/python3.8/dist-packages/aitemplate/compiler/ops/conv/conv2d.py", line 721, in _profile_dynamic_dim
    result_dict = {res.op_config: res for res in result[0][1]}
IndexError: list index out of range
yit-b commented 1 year ago

Commit hash: 89711d9cdc14044cb170163bf189963b9e8eaa08

yit-b commented 1 year ago

Running on an empty RTX 3090 with 24GB of vram. Suspicious that it fails when profiling a very high resolution, however I don't see any evidence of OOM. Maybe I'll try decreasing the top end of the value range in terms of either batch size or w/h?

yit-b commented 1 year ago

I managed to compile successfully with --width 1024 1024 --height 1024 1024, so I think this was a subtle OOM at the top end.