classilla / tenfourfox

Mozilla for Power Macintosh.
http://www.tenfourfox.com/
Other
276 stars 41 forks source link

Make the MacroAssembler performance awesome #118

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Nanojit was highly optimized and only some of those optimizations are in the 
methodjit as shipped in 9 (the G5 ones primarily). Here are some ideas:

- Strength reduce mulli into "x_sr_mulli" using adds and/or shifts, as we do in 
the methodjit. At the same time turn branchMul32 with an Imm32 into a mulli and 
eliminate the move(). Use the assembler, not macroassembler, to do the strength 
reduction.

- In branchSub32, add the code to see if the cond is not an Overflow test. If 
it is not, use addic_rc and eliminate the move().

- Ditch bool supportsFloatingPointTruncate() const { return false; } ... we do 
have a truncate after all. Investigate what we have to do to get that working.

These may require some significant refactoring:

- See if clearing SO prior to an op and doing branches on SO is faster than the 
equivalent mcrxr and branching on OV. It might be not in G3/G4. It probably is 
on G5, because it reduces use of microcoded instructions and will definitely 
shorten branch size and get better cache performance. However, this will gut 
branches. We might simply keep separate branch ops altogether for G5 vs G3/G4 
(G5 uses SO, G3/G4 uses mcrxr-OV).

- Since we aren't using all the registers, pre-load the constants used for 
float conversion and truncation into the excess GPRs, saving us some loads. The 
downside is that we need bigger stack frames and will always load the constants 
even in situations where we never reference those registers, so the actual 
improvement may be minimal if any. Type inference may make this even less 
profitable by completely eliminating a float codepath.

Original issue reported on code.google.com by classi...@floodgap.com on 19 Dec 2011 at 5:29

GoogleCodeExporter commented 9 years ago
Done. On G5, SunSpider drops around 10ms on the quad, and V8 goes up about 1%, 
but the difference is statistically significant. I also threw in some ABI 
blocks since I know the Amiga and LinuxPPC guys will jump on this when it gets 
posted.

Original comment by classi...@floodgap.com on 27 Feb 2012 at 8:56

GoogleCodeExporter commented 9 years ago
https://bugzilla.mozilla.org/show_bug.cgi?id=731110

Original comment by classi...@floodgap.com on 28 Feb 2012 at 4:24

GoogleCodeExporter commented 9 years ago
Mozilla-11 adds lea for BaseIndex, so I need to add that too.

Original comment by classi...@floodgap.com on 29 Feb 2012 at 5:33

GoogleCodeExporter commented 9 years ago
Closing

Original comment by classi...@floodgap.com on 11 Mar 2012 at 2:56