Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[AArch64][PeepholeOptimizer] Look through PHIs to find additional register sources #24504

Open Quuxplusone opened 9 years ago

Quuxplusone commented 9 years ago
Bugzilla Link PR24505
Status NEW
Importance P normal
Reported by Chad Rosier (mcrosier@codeaurora.org)
Reported on 2015-08-19 13:37:38 -0700
Last modified on 2015-09-11 10:26:14 -0700
Version trunk
Hardware PC Windows NT
CC bmakam@codeaurora.org, bruno.cardoso@gmail.com, gberry@codeaurora.org, haicheng@codeaurora.org, junbuml@codeaurora.org, kristof.beyls@gmail.com, llvm-bugs@lists.llvm.org, mcrosier@codeaurora.org, mssimpso@codeaurora.org, quentin.colombet@gmail.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Bruno recently committed a change to improve the peephole optimizer.  He was
specifically targeting x86, but this can be easily extended to other
architectures by marking target-specific instructions in the form "one source +
one destination bitcast" with "isBitcast."

The specific commit is r245442
http://llvm.org/viewvc/llvm-project?view=revision&revision=245442

[PeepholeOptimizer] Look through PHIs to find additional register sources

Reapply r243486.

- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.

With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:

A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C

B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C

C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0

Becomes:

A:
  psllq %mm1, %mm0
  jmp C

B:
  por %mm1, %mm0
  jmp C

C:
  pshufw  $238, %mm0, %mm0

Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526

Bruno did later revert the commit in r245446.  Regardless, once the final patch
lands we should consider investigating.
Quuxplusone commented 9 years ago

Reintroduced in r245479.

Some useful notes:

1) This commit enables PHI lookup only for uncoalescable copy like instructions, such as cross reg class bitcasts. "one source + one destination bitcast" is one form of uncoalescable copies, but there are other forms that are currently handled, see PeepholeOptimizer::isUncoalescableCopy()

2) Although this commit introduced PHI lookup for uncoalescable copy like instructions, everything need to support PHI lookup for coalescable copies is there as well, I only didn't enabled it since I didn't get time to test it, this is also something worth investigating (maybe even more profitable than the uncoalescable case).

Quuxplusone commented 9 years ago

I am porting this optimization to AArch64 and find followings:

In x86, copies between different data types use different instructions such as MMX_MOVD64grr so that we can add flags for individual copy instructions like Bruno did.

In aarch64, the peephole optimization pass uses the standard pseudo COPY instruction to represent all data movement operations between scalar and vector registers. This COPY instruction is lowered to fmov in a later pass called “Post-RA pseudo instruction expansion pass” by calling AArch64InstrInfo::copyPhysReg.

Is there a good way to label the fmov instruction of aarch64 earlier so that the peephole optimization can recognize it?

Quuxplusone commented 9 years ago
(In reply to comment #2)
> I am porting this optimization to AArch64 and find followings:
>
> In x86, copies between different data types use different instructions such
> as MMX_MOVD64grr so that we can add flags for individual copy instructions
> like Bruno did.
>
> In aarch64, the peephole optimization pass uses the standard pseudo COPY
> instruction to represent all data movement operations between scalar and
> vector registers.  This COPY instruction is lowered to fmov in a later pass
> called “Post-RA pseudo instruction expansion pass” by calling
> AArch64InstrInfo::copyPhysReg.
>
> Is there a good way to label the fmov instruction of aarch64 earlier so that
> the peephole optimization can recognize it?

You should add isbitcast in the td file.
Quuxplusone commented 9 years ago

Thank you, Quentin.

I tried to add isBitcast in td files before and it did not work.

The reason is that the real move instruction, fmov, is generated after the peephole optimization in AArch64. In the peeophole optimization, it is represented by the pseudo COPY instruction.