apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.66k stars 3.45k forks source link

[Bug] MetaScheduler Literal value exceeds maximum of int32 #15987

Open malixian opened 11 months ago

malixian commented 11 months ago

Expected behavior

I try to use MetaScheduler to tuning matmul, and the dimensions of the matrix are m=8192, n=14336, k=8192. When n=8192, everything is ok, but once m or n is equal to 14336, an error RuntimeError: parallel_for_dynamic error with [02:23:57] /home/malixian/repos/tensorir/tvm/src/ir/expr.cc:88: InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 will occur. BTW, it is ok when k equals 14336. According to the error message, I tried to comment out theICHECK code of the function IntImm in expr.cc and it worked normally, again. I think the DataType of Tir should be expanded to suit this case.

Actual behavior

error RuntimeError: parallel_for_dynamic error with [02:23:57] /home/malixian/repos/tensorir/tvm/src/ir/expr.cc:88: InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

Environment

TVM version is '0.15.dev0'

Steps to reproduce

def matmul_fp16(M: int, N: int, K: int, in_dtype: str, out_dtype: str):
    x = te.placeholder((M, K), name="X", dtype=in_dtype)
    y = te.placeholder((K, N), name="Y", dtype=in_dtype)
    k = te.reduce_axis((0, K), name="k")
    c = te.compute(  # pylint: disable=invalid-name
        (M, N),
        lambda i, j: te.sum(x[i][k].astype(out_dtype) * y[k][j].astype(out_dtype), axis=[k]),
        name="C",
    )
    return (x, y, c)

  def tune(in_dtype, out_dtype):
      target = Target("nvidia/nvidia-a100")
      M, N, K = 8192, 14336, 8192
      func = te.create_prim_func(
          matmul_fp16(M=M, N=N, K=K, in_dtype=in_dtype, out_dtype=out_dtype)
      ).with_attr({"global_symbol": "main"})

      space = ms.space_generator.PostOrderApply(
          sch_rules="cuda-tensorcore",
          postprocs="cuda-tensorcore",
          mutator_probs="cuda-tensorcore",
      )

      mod = tvm.IRModule({"main": func})
      with tempfile.TemporaryDirectory() as work_dir:
          db = ms.tir_integration.tune_tir(
              mod=mod,
              target=target,
              work_dir=work_dir,
              max_trials_global=32,
              builder=LocalBuilder(
                  f_build="meta_schedule.builder.async_build", initializer=initializer
              ),
              space=space,
          )
          sch = db.query_schedule(mod, target=target, workload_name="main")
          with tvm.transform.PassContext(config={"tir.use_async_copy": 1}):
              rt_mod = tvm.build(sch.mod, target=target)
malixian commented 11 months ago

Hi @wrongtest-intellif, I saw that you submitted a related PR before. Can you give me some suggestions for repairing it?

MasterJianxing commented 8 months ago

I met the same problem. Have you finally solved this problem?

malixian commented 8 months ago

I tried to comment out theICHECK code of the function IntImm in expr.cc and it worked normally

I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally.

MasterJianxing commented 8 months ago

I tried to comment out theICHECK code of the function IntImm in expr.cc and it worked normally

I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally.

Thanks, but it seems that it's not a safe way to solve, hah