Open wyzero opened 1 year ago
e2e model test on: Bert Base (TF) and Albert (PyTorch), on g6r, using single thread. Note that we only have one default schedule for all shape and the schedule is known to be less performant when n or k is large, thus the initial performance is supposed to be improved when we support schedule selection logic.
Bert Base (TF)
input | TF 2.8(s) | DISC-ACL(s) | DISC-Transform(s) | speedup (DISC-transform / DISC-ACL) |
---|---|---|---|---|
(1, 128) | 0.742 | 0.638 | 0.656 | 97.3% |
(2, 128) | 1.41 | 1.24 | 1.27 | 97.6% |
(4, 128) | 2.85 | 2.36 | 2.55 | 92.5% |
(8, 128) | 5.84 | 4.68 | 5.07 | 92.3% |
(16, 128) | 11.9 | 9.55 | 10.2 | 93.6% |
Albert (PyTorch)
input | TorchScript | OnnxRuntime | DISC-ACL | DISC-Transform |
---|---|---|---|---|
(2, 12) | 0.197 | 0.140 | 0.117 | 0.139 |
We'll start to explore using MLIR transform dialect to do codegen for (fused) compute-intensive pattern. The initial target is to support gemm codegen on ARM platform to address the dynamic shape problem of Arm Compute Library.
The initial plan is:
kTransform
for the transform-based fusion pattern.disc_linalg.multi_level_pack
op, used for doing packing.transform.disc.cache_read
transform op, relying ondisc_linalg.multi_level_pack
op.disc_linalg.multi_level_pack
.disc_linalg.multi_level_pack
to loop if it can not be folded.kTransform
fusion pattern, lower it to linalg and then schedule it.kTransform
pattern.