[Bug] Unroll loop runs slowly

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Apache License 2.0

11.58k stars 3.43k forks source link

Hello, These days I constructed some PrimFunc files and I found that the unroll loop ran more slower than the other forkind loop in some situations. I tested the For statement with different forkind and a big length of the loop. I’m curious about the reason why the unroll loop runs slower with the big extent.

Expected behavior

The different loop forkinds have the same speeds.

Actual behavior

The unroll loop ran slower than the other rorkinds loop. slow

Environment

TVM 0.10dev0 ,git commit ee319d9d23c80091da9c4fb764b1e6d49d462714

Steps to reproduce

import tvm
from tvm import tir
import time

c1=tir.const(32450,'uint32')
c2=tir.const(15000000,'uint32')
v1=tir.Var('v1','uint32')
for1=tir.For(v1,c1,c2,3,tir.Evaluate(1))
f1=tir.PrimFunc([],for1)
for2=tir.For(v1,c1,c2,1,tir.Evaluate(1))
f2=tir.PrimFunc([],for2)
for3=tir.For(v1,c1,c2,0,tir.Evaluate(1))
f3=tir.PrimFunc([],for3)

t0=time.time()
tvm.build(f2)
print(time.time()-t0)
t1=time.time()
tvm.build(f3)
print(time.time()-t1)
t2=time.time()
tvm.build(f1)
print(time.time()-t2)

Triage

tir.op
needs-triage

import tvm from tvm import tir v1=tir.Var('v1','uint32') c1=tir.const(905892654,'uint32') c2=tir.const(174155511,'uint32') f1=tir.Cast('float32',c2) ceil1=tir.ceil(f1) c3=tir.Cast('uint32',ceil1) c4=tir.const(35,'uint32') v2=tir.Var('v2','float32') c5=tir.const(0.187,'float32') let1=tir.LetStmt(v2,c5,tir.Evaluate(0)) for1=tir.For(v1,c1,c3,3,let1) prim=tir.PrimFunc({},for1) b1=tir.IntImm('bool',4582) if1=tir.IfThenElse(tir.EQ(tir.And(b1,b1),tir.Sub(b1,b1)),for1,tir.Evaluate(tir.ret(c2))) p2=tir.PrimFunc({},if1) mod = tvm.IRModule({"main": p2}) mod=tir.transform.UnrollLoop()(mod)

apache / tvm