ShouyangDong / transbot

Transcompile code by using llm and rl
0 stars 0 forks source link

优化整个运行速度 #3

Open ShouyangDong opened 1 month ago

ShouyangDong commented 1 month ago

由于每个动作后得到的sketch有多个代码,比如拆分: 代码1:

    for (int col = 0; col < 64; col++) {
        for (int i = 0; i < 512; i++) {
            B_wram[i * 64 + col] = B[i * 64 + col];

代码 2:

    for (int col = 0; col < 512; col++) {
        for (int i = 0; i < 64; i++) {
            B_wram[i * 512 + col] = B[i * 512 + col];

这两个代码是等效的,导致最后的搜索空间是正交的, O(sketch) * O(annotation)

减支建议:

  1. 如果action ms.schedule_rule.AutoBind()已经出现1次,后面的action就不会出现该源语
  2. 在寻找最优解的时候,是不是找到topk最好的sketch, 每个sketch再找topk个annotation,最后类似beam search再找出最好的解
ShouyangDong commented 1 month ago
spaces = generate_design_space(
    kind="cuda",
    mod=self.mod,
    target=self.tvm_tgt,
    types=None,
    sch_rules=actions,
)

score = objective(spaces[0].mod, self.tvm_tgt, self.mod_name, self.inputs)

上面代码中永远采用的是0个mod, 需要放开不同action下面的多个mod,进行mod之间的比较