优化整个运行速度 - Githubissues

ShouyangDong / transbot

Transcompile code by using llm and rl

0 stars 0 forks source link

优化整个运行速度 #3

Open ShouyangDong opened 1 month ago

ShouyangDong commented 1 month ago

由于每个动作后得到的sketch有多个代码，比如拆分：代码1：

    for (int col = 0; col < 64; col++) {
        for (int i = 0; i < 512; i++) {
            B_wram[i * 64 + col] = B[i * 64 + col];

代码 2：

    for (int col = 0; col < 512; col++) {
        for (int i = 0; i < 64; i++) {
            B_wram[i * 512 + col] = B[i * 512 + col];

这两个代码是等效的，导致最后的搜索空间是正交的， O(sketch) * O(annotation)

减支建议：

如果action ms.schedule_rule.AutoBind()已经出现1次，后面的action就不会出现该源语
在寻找最优解的时候，是不是找到topk最好的sketch，每个sketch再找topk个annotation，最后类似beam search再找出最好的解

ShouyangDong commented 1 month ago

spaces = generate_design_space(
    kind="cuda",
    mod=self.mod,
    target=self.tvm_tgt,
    types=None,
    sch_rules=actions,
)

score = objective(spaces[0].mod, self.tvm_tgt, self.mod_name, self.inputs)

上面代码中永远采用的是0个mod，需要放开不同action下面的多个mod，进行mod之间的比较