Xilinx / llvm-aie

Fork of LLVM to support AMD AIEngine processors
Other
107 stars 12 forks source link

Interleave loop pre-header with SWP prologue #236

Open gbossu opened 1 week ago

gbossu commented 1 week ago

Better review commit by commit, a lot of them are NFC and prepare for the final commit which enables interleaving.

QoR shows only improvements. Those will get bigger as we increase the scope of the post-pipeliner.

| Core_Compute_Cycle_Count       | ReduceProdAxis_1_aie2_bf16 | ReduceProdAxis_2_aie2_bf16 | ReduceProdAxis_4_aie2_bf16 | ReduceMeanAxis_2_aie2_bf16 | ReduceMeanAxis_4_aie2_bf16 | ReduceMeanAxis_1_aie2_bf16 | Sigmoid_bf16_0 | Requantize_0 | Softmax_bf16_0 | ReduceMeanAxis_3_aie2_bf16 | ReduceMeanAxis_6_aie2_bf16 | ReduceMeanAxis_5_aie2_bf16 | ReduceSumAxis_2_aie2_bf16 | Sigmoid_bf16_1 | Exp_bf16_0   | ReduceSumAxis_4_aie2_bf16 | ReduceSumAxis_1_aie2_bf16 | Log_bf16_0   | Neg_aie2_1   | EleMax_aie2_bfloat16 | EleMin_aie2_bfloat16 | ReduceMeanAxis_7_aie2_bf16 | Requantize_1 | MulAttributeBroadcasting_aie2_int8_0 | Tanh_1       | ReduceProdAxis_6_aie2_bf16 | ReduceSumAxis_5_aie2_bf16 | ReduceSumAxis_3_aie2_bf16 | ReduceSumAxis_6_aie2_bf16 | GELU_1       | HardSigmoid_bf16_0 | ReduceProdAxis_3_aie2_bf16 | ReduceProdAxis_5_aie2_bf16 | Exp_bf16_1   | ReduceProdAxis_7_aie2_bf16 | ReduceSumAxis_7_aie2_bf16 | Tanh_0       | GELU_0       | InstanceNormPart2_aie2_bf16_0 | MulBroadcasting_aie2_0 | HardSigmoid_bf16_1 | EleMax_aie2_int8 | EleMin_aie2_int8 | BatchNorm1d_aie2_bfloat16 | Mul_aie2_0   | HardSigmoid_int8_1 | HardSigmoid_int8_0 | BatchNorm1d_aie2_int8 | BitwiseNot_aie2_0 | BatchNorm2D_1 | GEMM_int8_0  | GEMM_int8_1  | Softmax_1    | BatchNorm2D_0 | Average diff |
| ------------------------------ | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------------------- | -------------- | ------------ | -------------- | -------------------------- | -------------------------- | -------------------------- | ------------------------- | -------------- | ------------ | ------------------------- | ------------------------- | ------------ | ------------ | -------------------- | -------------------- | -------------------------- | ------------ | ------------------------------------ | ------------ | -------------------------- | ------------------------- | ------------------------- | ------------------------- | ------------ | ------------------ | -------------------------- | -------------------------- | ------------ | -------------------------- | ------------------------- | ------------ | ------------ | ----------------------------- | ---------------------- | ------------------ | ---------------- | ---------------- | ------------------------- | ------------ | ------------------ | ------------------ | --------------------- | ----------------- | ------------- | ------------ | ------------ | ------------ | ------------- | ------------ |
| baseline                       | 35466                      | 17821                      | 35498                      | 13076                      | 13046                      | 13040                      | 2627           | 1421         | 8176           | 7233                       | 7219                       | 7212                       | 11900                     | 1727           | 7872         | 11920                     | 11884                     | 4149         | 455          | 227                  | 227                  | 6263                       | 781          | 517                                  | 2578         | 8709                       | 7055                      | 7052                      | 7038                      | 2811         | 937                | 8722                       | 8730                       | 1531         | 1794                       | 6212                      | 1970         | 2144         | 9536                          | 294                    | 649                | 164              | 164              | 390                       | 231          | 427                | 417                | 408                   | 135               | 416           | 2797         | 32931        | 570          | 308           | +0.00%       |
| this PR                        | 35405                      | 17789                      | 35433                      | 13051                      | 13016                      | 13010                      | 2620           | 1417         | 8144           | 7204                       | 7190                       | 7183                       | 11852                     | 1720           | 7840         | 11871                     | 11835                     | 4131         | 453          | 226                  | 226                  | 6231                       | 777          | 514                                  | 2560         | 8647                       | 7004                      | 7001                      | 6987                      | 2790         | 930                | 8656                       | 8662                       | 1519         | 1778                       | 6156                      | 1952         | 2123         | 9440                          | 291                    | 642                | 162              | 162              | 385                       | 228          | 421                | 411                | 402                   | 132               | 406           | 2701         | 31779        | 550          | 296           | -0.15%       |
| Total diff                     | IMPR(-0.17%)               | IMPR(-0.18%)               | IMPR(-0.18%)               | IMPR(-0.19%)               | IMPR(-0.23%)               | IMPR(-0.23%)               | IMPR(-0.27%)   | IMPR(-0.28%) | IMPR(-0.39%)   | IMPR(-0.40%)               | IMPR(-0.40%)               | IMPR(-0.40%)               | IMPR(-0.40%)              | IMPR(-0.41%)   | IMPR(-0.41%) | IMPR(-0.41%)              | IMPR(-0.41%)              | IMPR(-0.43%) | IMPR(-0.44%) | IMPR(-0.44%)         | IMPR(-0.44%)         | IMPR(-0.51%)               | IMPR(-0.51%) | IMPR(-0.58%)                         | IMPR(-0.70%) | IMPR(-0.71%)               | IMPR(-0.72%)              | IMPR(-0.72%)              | IMPR(-0.72%)              | IMPR(-0.75%) | IMPR(-0.75%)       | IMPR(-0.76%)               | IMPR(-0.78%)               | IMPR(-0.78%) | IMPR(-0.89%)               | IMPR(-0.90%)              | IMPR(-0.91%) | IMPR(-0.98%) | IMPR(-1.01%)                  | IMPR(-1.02%)           | IMPR(-1.08%)       | IMPR(-1.22%)     | IMPR(-1.22%)     | IMPR(-1.28%)              | IMPR(-1.30%) | IMPR(-1.41%)       | IMPR(-1.44%)       | IMPR(-1.47%)          | IMPR(-2.22%)      | IMPR(-2.40%)  | IMPR(-3.43%) | IMPR(-3.50%) | IMPR(-3.51%) | IMPR(-3.90%)  | -0.15%       |
andcarminati commented 1 week ago

Hi @gbossu, as we did before, we could integrate the 1st and 3rd commits before, as they are standalone changes.

gbossu commented 1 week ago

Hi @gbossu, as we did before, we could integrate the 1st and 3rd commits before, as they are standalone changes.

I'm actually tempted to drop the 3rd commit and do not have AIEMaxLatencyFinder/RegionEndEdges care about "timed"/"fixed" instructions. I think it makes sense if they keep focusing on successor MBBs instead. For dependencies between "fixed" and "free" instructions, I can add them in the EmitFixedSUnits mutator instead. Any opinion?

andcarminati commented 1 week ago

Hi @gbossu, as we did before, we could integrate the 1st and 3rd commits before, as they are standalone changes.

I'm actually tempted to drop the 3rd commit and do not have AIEMaxLatencyFinder/RegionEndEdges care about "timed"/"fixed" instructions. I think it makes sense if they keep focusing on successor MBBs instead. For dependencies between "fixed" and "free" instructions, I can add them in the EmitFixedSUnits mutator instead. Any opinion?

Is this case, I think it makes sense, as we are introducing this new mutator! In this way we can also restrict a bit more the changes.

gbossu commented 1 day ago

Any more comments @andcarminati @martien-de-jong ?

andcarminati commented 1 hour ago

Hi @gbossu, I finished my review and left few suggestions. Nice piece of engineering that will make a lot more difference for the post-swp as we increase its use.