Xilinx / llvm-aie

Fork of LLVM to support AMD AIEngine processors
Other
107 stars 12 forks source link

MachineLICM to hoist instructions with constant inputs #220

Closed gbossu closed 3 weeks ago

gbossu commented 1 month ago

This mostly extends the existing post-RA LICM pass so that it actually does something about instructions with register inputs. I'll see if I can upstream those changes.

Then there is a DAGMutator change to give more opportunities to MachineLICM

Better review commit by commit.

| Core_Compute_Cycle_Count   | bfloat16      | Mul2d_bf16_0 | Scale_Add_0  | Scale_Add_1  | Mul2d_bf16_1 | InstanceNormPart1_aie2_bf16_0 | BatchNorm1d_aie2_bfloat16 | BatchNorm2D_1 | LayerNormC8Part1_aie2_bf16_0 | Conv2D_ReLU_int8_1 | int8         | BatchNorm2D_0 | Tanh_0       | BatchNorm1d_aie2_int8 | Tanh_1       | ThresholdedRelu_aie2_int8 | Add2D_1      | Sin_aie2_bf16 | Conv2D_ReLU_int8_0 | Softmax_1    | Elu_aie2_int8_0 | Conv2D_DW_bf16_0 | InstanceNormPart2_aie2_bf16_0 | ReduceMeanAxis_1_aie2_bf16 | ReduceMeanAxis_4_aie2_bf16 | Rsqrt_aie2_int8_0 | ReduceMeanAxis_2_aie2_bf16 | DilatedConv2D_1 | SigmoidTemplated_int8_0 | SigmoidTemplated_int8_1 | HardswishAsHardsigmoid_aie2_0 | Hardswish_aie2_0 | Sub_aie2_int8_0 | Sub_aie2_int8_0_ptr_interface | ReduceMeanAxis_5_aie2_bf16 | ReduceMeanAxis_6_aie2_bf16 | ReduceMeanAxis_3_aie2_bf16 | Add_aie2_0   | SubBroadcasting_aie2_int8_0 | SubBroadcasting_aie2_int8_0_ptr_interface | AddBroadcasting_aie2_0 | ReduceSumAxis_1_aie2_int8 | AddAttributeBroadcasting_aie2_int8 | SubAttributeBroadcasting_aie2_int8_0 | Sin_aie2_int8 | Conv2D_DW_1  | Conv2D_SV60  | Conv2D_FC_0  | GEMM_bf16_1  | Conv2D_0     |       | AvgPool2dVariant_aie2_bf16_1 | Conv2D_1     | ReduceProdAxis_4_aie2_bf16 | ReduceProdAxis_1_aie2_bf16 | ReduceProdAxis_2_aie2_bf16 | Mul2D_0      | Mul2D_1      | HardswishAsHardsigmoid_aie2_1 | Hardswish_aie2_1 | Erf_aie2_bf16_0 | ReduceProdAxis_5_aie2_bf16 | ReduceProdAxis_6_aie2_bf16 | ReduceProdAxis_3_aie2_bf16 | ReduceProdAxis_7_aie2_bf16 | TanhTemplated_aie2_bfloat16 | MulAttributeBroadcasting_aie2_int8_0 | SigmoidTemplated_bf16_0 | GELU_0        | MulBroadcasting_aie2_0 | GELU_1        | SiLU_aie2_bf16 | Mul_aie2_0    | HardSigmoid_bf16_1 | HardSigmoid_bf16_0 | MulBroadcastingBf16_aie2_0 | MulBf16_aie2_0 | MulAttributeBroadcasting_aie2_bf16_0 | Average diff |
| -------------------------- | ------------- | ------------ | ------------ | ------------ | ------------ | ----------------------------- | ------------------------- | ------------- | ---------------------------- | ------------------ | ------------ | ------------- | ------------ | --------------------- | ------------ | ------------------------- | ------------ | ------------- | ------------------ | ------------ | --------------- | ---------------- | ----------------------------- | -------------------------- | -------------------------- | ----------------- | -------------------------- | --------------- | ----------------------- | ----------------------- | ----------------------------- | ---------------- | --------------- | ----------------------------- | -------------------------- | -------------------------- | -------------------------- | ------------ | --------------------------- | ----------------------------------------- | ---------------------- | ------------------------- | ---------------------------------- | ------------------------------------ | ------------- | ------------ | ------------ | ------------ | ------------ | ------------ |       | ---------------------------- | ------------ | -------------------------- | -------------------------- | -------------------------- | ------------ | ------------ | ----------------------------- | ---------------- | --------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | --------------------------- | ------------------------------------ | ----------------------- | ------------- | ---------------------- | ------------- | -------------- | ------------- | ------------------ | ------------------ | -------------------------- | -------------- | ------------------------------------ | ------------ |
| Baseline                   | 907(+0.00%)   | 505(+0.00%)  | 367(+0.00%)  | 367(+0.00%)  | 321(+0.00%)  | 2882(+0.00%)                  | 387(+0.00%)               | 415(+0.00%)   | 8890(+0.00%)                 | 922(+0.00%)        | 846(+0.00%)  | 306(+0.00%)   | 1964(+0.00%) | 406(+0.00%)           | 2572(+0.00%) | 865(+0.00%)               | 434(+0.00%)  | 3009(+0.00%)  | 10145(+0.00%)      | 570(+0.00%)  | 578(+0.00%)     | 1175(+0.00%)     | 9456(+0.00%)                  | 13034(+0.00%)              | 13040(+0.00%)              | 2376(+0.00%)      | 13070(+0.00%)              | 5382(+0.00%)    | 1275(+0.00%)            | 1275(+0.00%)            | 1368(+0.00%)                  | 1368(+0.00%)     | 703(+0.00%)     | 703(+0.00%)                   | 7208(+0.00%)               | 7215(+0.00%)               | 7229(+0.00%)               | 725(+0.00%)  | 753(+0.00%)                 | 753(+0.00%)                               | 775(+0.00%)            | 7235(+0.00%)              | 806(+0.00%)                        | 806(+0.00%)                          | 841(+0.00%)   | 852(+0.00%)  | 857(+0.00%)  | 2647(+0.00%) | 7661(+0.00%) | 7687(+0.00%) |  ...  | 1783(+0.00%)                 | 2458(+0.00%) | 35954(+0.00%)              | 35922(+0.00%)              | 18052(+0.00%)              | 548(+0.00%)  | 548(+0.00%)  | 1590(+0.00%)                  | 1585(+0.00%)     | 2894(+0.00%)    | 9184(+0.00%)               | 9168(+0.00%)               | 9185(+0.00%)               | 1894(+0.00%)               | 1143(+0.00%)                | 581(+0.00%)                          | 1954(+0.00%)            | 2594(+0.00%)  | 358(+0.00%)            | 3426(+0.00%)  | 3608(+0.00%)   | 295(+0.00%)   | 966(+0.00%)        | 1434(+0.00%)       | 1174(+0.00%)               | 1119(+0.00%)   | 1555(+0.00%)                         | +0.00%       |
| MachineLICM changes        | 907(+0.00%)   | 505(+0.00%)  | 371(+1.09%)  | 371(+1.09%)  | 321(+0.00%)  | 2901(+0.66%)                  | 389(+0.52%)               | 417(+0.48%)   | 8930(+0.45%)                 | 926(+0.43%)        | 849(+0.35%)  | 307(+0.33%)   | 1970(+0.31%) | 407(+0.25%)           | 2578(+0.23%) | 867(+0.23%)               | 435(+0.23%)  | 3015(+0.20%)  | 10164(+0.19%)      | 571(+0.18%)  | 579(+0.17%)     | 1177(+0.17%)     | 9472(+0.17%)                  | 13056(+0.17%)              | 13062(+0.17%)              | 2380(+0.17%)      | 13092(+0.17%)              | 5391(+0.17%)    | 1277(+0.16%)            | 1277(+0.16%)            | 1370(+0.15%)                  | 1370(+0.15%)     | 704(+0.14%)     | 704(+0.14%)                   | 7218(+0.14%)               | 7225(+0.14%)               | 7239(+0.14%)               | 726(+0.14%)  | 754(+0.13%)                 | 754(+0.13%)                               | 776(+0.13%)            | 7244(+0.12%)              | 807(+0.12%)                        | 807(+0.12%)                          | 842(+0.12%)   | 853(+0.12%)  | 858(+0.12%)  | 2650(+0.11%) | 7669(+0.10%) | 7695(+0.10%) |  ...  | 1781(-0.11%)                 | 2450(-0.33%) | 35498(-1.27%)              | 35466(-1.27%)              | 17821(-1.28%)              | 548(+0.00%)  | 548(+0.00%)  | 1590(+0.00%)                  | 1585(+0.00%)     | 2894(+0.00%)    | 8730(-4.94%)               | 8709(-5.01%)               | 8722(-5.04%)               | 1794(-5.28%)               | 1051(-8.05%)                | 517(-11.02%)                         | 1954(+0.00%)            | 2144(-17.35%) | 294(-17.88%)           | 2811(-17.95%) | 3608(+0.00%)   | 231(-21.69%)  | 649(-32.82%)       | 937(-34.66%)       | 1174(+0.00%)               | 1119(+0.00%)   | 1555(+0.00%)                         | -0.49%       |
| DAGMutator changes         | 1217(+34.18%) | 519(+2.77%)  | 374(+0.81%)  | 374(+0.81%)  | 327(+1.87%)  | 2901(+0.00%)                  | 389(+0.00%)               | 417(+0.00%)   | 8930(+0.00%)                 | 926(+0.00%)        | 849(+0.00%)  | 307(+0.00%)   | 1970(+0.00%) | 407(+0.00%)           | 2578(+0.00%) | 867(+0.00%)               | 435(+0.00%)  | 3015(+0.00%)  | 10164(+0.00%)      | 571(+0.00%)  | 579(+0.00%)     | 1177(+0.00%)     | 9472(+0.00%)                  | 13056(+0.00%)              | 13062(+0.00%)              | 2380(+0.00%)      | 13092(+0.00%)              | 5391(+0.00%)    | 1277(+0.00%)            | 1277(+0.00%)            | 1370(+0.00%)                  | 1370(+0.00%)     | 704(+0.00%)     | 704(+0.00%)                   | 7218(+0.00%)               | 7225(+0.00%)               | 7239(+0.00%)               | 726(+0.00%)  | 754(+0.00%)                 | 754(+0.00%)                               | 776(+0.00%)            | 7244(+0.00%)              | 807(+0.00%)                        | 807(+0.00%)                          | 842(+0.00%)   | 853(+0.00%)  | 858(+0.00%)  | 2650(+0.00%) | 7669(+0.00%) | 7695(+0.00%) |  ...  | 1781(+0.00%)                 | 2450(+0.00%) | 35498(+0.00%)              | 35466(+0.00%)              | 17821(+0.00%)              | 533(-2.74%)  | 533(-2.74%)  | 1527(-3.96%)                  | 1522(-3.97%)     | 2770(-4.28%)    | 8730(+0.00%)               | 8709(+0.00%)               | 8722(+0.00%)               | 1794(+0.00%)               | 1050(-0.10%)                | 517(+0.00%)                          | 1633(-16.43%)           | 2144(+0.00%)  | 294(+0.00%)            | 2811(+0.00%)  | 2908(-19.40%)  | 231(+0.00%)   | 649(+0.00%)        | 937(+0.00%)        | 752(-35.95%)               | 697(-37.71%)   | 893(-42.57%)                         | -0.37%       |
| Total diff                 | REGR(+34.18%) | REGR(+2.77%) | REGR(+1.91%) | REGR(+1.91%) | REGR(+1.87%) | REGR(+0.66%)                  | REGR(+0.52%)              | REGR(+0.48%)  | REGR(+0.45%)                 | REGR(+0.43%)       | REGR(+0.35%) | REGR(+0.33%)  | REGR(+0.31%) | REGR(+0.25%)          | REGR(+0.23%) | REGR(+0.23%)              | REGR(+0.23%) | REGR(+0.20%)  | REGR(+0.19%)       | REGR(+0.18%) | REGR(+0.17%)    | REGR(+0.17%)     | REGR(+0.17%)                  | REGR(+0.17%)               | REGR(+0.17%)               | REGR(+0.17%)      | REGR(+0.17%)               | REGR(+0.17%)    | REGR(+0.16%)            | REGR(+0.16%)            | REGR(+0.15%)                  | REGR(+0.15%)     | REGR(+0.14%)    | REGR(+0.14%)                  | REGR(+0.14%)               | REGR(+0.14%)               | REGR(+0.14%)               | REGR(+0.14%) | REGR(+0.13%)                | REGR(+0.13%)                              | REGR(+0.13%)           | REGR(+0.12%)              | REGR(+0.12%)                       | REGR(+0.12%)                         | REGR(+0.12%)  | REGR(+0.12%) | REGR(+0.12%) | REGR(+0.11%) | REGR(+0.10%) | REGR(+0.10%) |       | IMPR(-0.11%)                 | IMPR(-0.33%) | IMPR(-1.27%)               | IMPR(-1.27%)               | IMPR(-1.28%)               | IMPR(-2.74%) | IMPR(-2.74%) | IMPR(-3.96%)                  | IMPR(-3.97%)     | IMPR(-4.28%)    | IMPR(-4.94%)               | IMPR(-5.01%)               | IMPR(-5.04%)               | IMPR(-5.28%)               | IMPR(-8.14%)                | IMPR(-11.02%)                        | IMPR(-16.43%)           | IMPR(-17.35%) | IMPR(-17.88%)          | IMPR(-17.95%) | IMPR(-19.40%)  | IMPR(-21.69%) | IMPR(-32.82%)      | IMPR(-34.66%)      | IMPR(-35.95%)              | IMPR(-37.71%)  | IMPR(-42.57%)                        | -0.87%       |

I'll check the 30% regression in ReLu_bfloat16 in more detail (it comes from extra spills). But even in this state the QoR is good.

gbossu commented 4 weeks ago

note: we are lagging behind upstream by a couple of months, so i cherry-picked some commits from there to minimise conflicts.

andcarminati commented 3 weeks ago

This PR extends MachineLICM in a very clever way. I left some minor comments, mostly for clarification.