Closed gbossu closed 1 month ago
This teaches the loop-aware scheduler a tiny bit about LCDs, and tries to reduce timing conflicts with the next iteration.
This gets us back to II=10 in Add2D, which had degraded to II=11 after #168
| Core_Compute_Cycle_Count | Requantize_0 | Requantize_1 | Reciprocal_aie2_1 | Sign_int8_1 | Cast_aie2_int8 | Cast_aie2_int8_1 | EleMax_aie2_int8 | EleMin_aie2_int8 | Select_aie2_bf16 | Sign_int8_0 | HardSigmoidTemplated_bf16_0 | Shrink_aie2_0 | | Conv1D_DW_AIE2_bf16_1 | BitwiseAnd_int8_0 | BitwiseOr_int8_0 | Conv1D_DW_AIE2_bf16_0 | AddAttributeBroadcasting_aie2_bf16 | SubAttributeBroadcasting_aie2_bf16_0 | AddBroadcastingBf16_aie2_0 | Add2D_Standalone_0 | Round_aie2_0 | SubBroadcasting_aie2_bf16_0 | AddBf16_aie2_0 | Sub_aie2_bf16_0 | Add2D_0 | Add2D_Standalone_1 | Abs_bf16_0 | Average diff | | ------------------------------------------------------------------------- | ------------ | ------------ | ----------------- | ------------ | -------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------------ | --------------------------- | ------------- | | --------------------- | ----------------- | ---------------- | --------------------- | ---------------------------------- | ------------------------------------ | -------------------------- | ------------------ | ------------ | --------------------------- | -------------- | --------------- | ------------ | ------------------ | ------------ | ------------ | | VITIS_AIE_QOR_MLLIB_loop_heuristics/20241004_153443_qor_report_peano.json | 1281 | 705 | 2400 | 122 | 761 | 761 | 209 | 209 | 328 | 416 | 616 | 671 | ... | 4026 | 482 | 482 | 3467 | 787 | 787 | 754 | 334 | 381 | 732 | 700 | 678 | 229 | 510 | 407 | | | VITIS_AIE_QOR_MLLIB_loop_heuristics/20241004_163728_qor_report_peano.json | 1343 | 735 | 2432 | 123 | 765 | 765 | 210 | 210 | 329 | 417 | 617 | 672 | | 3902 | 467 | 467 | 3358 | 760 | 760 | 727 | 322 | 367 | 705 | 673 | 651 | 217 | 482 | 376 | -0.13% | | Total diff | REGR(+4.84%) | REGR(+4.26%) | REGR(+1.33%) | REGR(+0.82%) | REGR(+0.53%) | REGR(+0.53%) | REGR(+0.48%) | REGR(+0.48%) | REGR(+0.30%) | REGR(+0.24%) | REGR(+0.16%) | REGR(+0.15%) | | IMPR(-3.08%) | IMPR(-3.11%) | IMPR(-3.11%) | IMPR(-3.14%) | IMPR(-3.43%) | IMPR(-3.43%) | IMPR(-3.58%) | IMPR(-3.59%) | IMPR(-3.67%) | IMPR(-3.69%) | IMPR(-3.86%) | IMPR(-3.98%) | IMPR(-5.24%) | IMPR(-5.49%) | IMPR(-7.62%) | -0.13% |
Addressed the comments :) I also added one more commit because I realized we do not call AIEScheduleDAGMI::schedule() for regions of a single instruction. This is harmless for now, but I'd rather be cautious.
AIEScheduleDAGMI::schedule()
This teaches the loop-aware scheduler a tiny bit about LCDs, and tries to reduce timing conflicts with the next iteration.
This gets us back to II=10 in Add2D, which had degraded to II=11 after #168