einstein95 / vba-wii

Automatically exported from code.google.com/p/vba-wii
2 stars 0 forks source link

Optimization: Modes 0-5 #195

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Directory: vba/gba/

This took me FOREVER. I dramatically altered the code to pipeline better,
dependencies have been removed, and if-statements have been restructured to
ensure maximum speed.

I have tested all these changes on several games, and while this may be a
wishful thinking on my part, I swear that I can see a difference in frame rate.

Interesting side note: several optimizations are specific to the Wii and
the PowerPC processor within. My tests have shown that the very same code
will run significantly SLOWER on an Intel/AMD chip. Knowing the
architecture sure does help!

Original issue reported on code.google.com by dancinninjac on 10 Dec 2009 at 6:29

Attachments:

GoogleCodeExporter commented 9 years ago
This is something I've been always curious about, since you seem to know the PPC
architecture well: what is important to take in account when coding in C/C++ 
for this
architecture and what would help optimizing instructions pipelining ? Isn't 
that the
role of the compiler to optimize your code ?

Original comment by ekeeke31@gmail.com on 11 Dec 2009 at 11:05

GoogleCodeExporter commented 9 years ago
You would think that the compiler would be in charge of that, yes. However, 
computers
are still not as smart as humans (number 1!). There are a lot of "optimization
blockers" that keep compilers from optimizing. A good example is function 
calls. Even
though your code looks like this: 

int r = myFunction() + myFunction();

...the compiler can't change it into "myFunction()*2", because a function could
change a global variable or another effect, so the compiler chooses not to 
touch it.
There are a lot more reasons why compilers can't (or won't) optimize the code 
for
you. Oftentimes they're just not smart enough. 

What's interesting is that many of the optimizations that I've found, aren't
necessarily architecture specific. The optimizations that I found for the 
"modes"
files actually had to do with read-write dependencies.

Case in point:

int a = 24 / 8;
if(a == 3)
   a = 5;

This is an overly simplistic example, but basically what the "chip" does is try 
to do
several things at once. If it sees a loop, it will try to execute what's inside 
the
loop before it's even finished evaluating the loop condition. But in this case, 
it
can't. It depends on 'a' being assigned a value, then it has to wait until it's
evaluated (a == 3), THEN it can assign the value of 5. But if it were like this:

int b = 24 / 8;
if(b == 3)
   a = 5;

The computer doesn't have to wait for 'b' to be evaluated in order to start 
computing
'a'. So if the "if statement is true, we don't have to wait for it to go 
through all
the steps, it's all ready! Did that make sense?

Think of it like an assembly line. One person paints the toy, a second sets up 
the
box, a third puts the toy into the box, and the fourth wraps the box like a 
present.

The third and fourth people depend on the first two to do their work in order 
do work
themselves. You can't put a toy into a box that hasn't been set up yet! And you 
can't
wrap the box that doesn't have a toy in it. However, while the first person is
painting the toy, there's nothing stopping the second person from setting up 
the box.
They don't need to wait for the first person to get done. That's pipelining.

The original code in these files had "color", for example, being constantly 
changed,
and checked. So, it had to wait until the first "check" was done before it 
would even
attempt the next one.

So what can you do to help the compiler, specific to this architecture 
(PowerPC). The
best piece of advice I'd say without going on for pages (too late) is use 
temporary
variables whenever you can. The PowerPC has 32 registers! Not all of them you 
can
use, but it's FAR more than 8 with the normal Intel/AMD processors, (and you 
can only
use about 6 of those). Why the code I changed won't work on Intel/AMD is 
because it
used too many registers (in the form of temporary variables) which is fine for 
the PPC.

I hope my rambling didn't bore you too much.

Original comment by dancinninjac on 11 Dec 2009 at 7:34

GoogleCodeExporter commented 9 years ago
Thanks for the explanation! Your changes look great, and it might just be 
wishful 
thinking on my part too but I think I see a small speed gain from this.

I too have few questions about optimization, a few of them are:

1. I know compilers do loop unrolling. Is there a reason you unrolled the loops 
in 
the code instead of letting gcc do it (eg: gcc won't make efficient use of the 
registers?)

2. Replacing hard-coded arithmetic with constant numbers. eg: skip_read( in,  
5880); //skip_read( in,  6*735 + 2*735);
I would've thought the compiler would optimize this (ie: do the calculation) 
and 
substitute the fixed value. Is this not the case?

3. There's tons of GCC optimization flags. Some we're using, some we're not. 
How can 
we know which ones will provide the best performance?

Original comment by dborth@gmail.com on 14 Dec 2009 at 7:35

GoogleCodeExporter commented 9 years ago
1. The compiler can do SOME loop unrolling. Some it can't. The only way to be
absolutely sure that the loop will be unrolled is to, unfortunately, do it 
manually.

2. Yeah, the compiler does do that. I guess I was bored or something. 

3. I'm still not familiar with the GCC optimization flags, and EXACTLY what 
they do.
I think the ones in use now should suffice. 

In the end, the compiler isn't magic. One of the things that it has trouble 
with are
math proofs. For instance: 

     myVar/3 + myOtherVar/3

You can re-write that, and save yourself a division operation (the most 
expensive
operation there is!) to get this:

     (myVar + myOtherVar)/3

This is just a simple example, but it illustrates my point. When you get into 
more
complex code, the compiler will start to have trouble optimizing stuff away.

Original comment by dancinninjac on 14 Dec 2009 at 6:41

GoogleCodeExporter commented 9 years ago

Original comment by dborth@gmail.com on 23 Dec 2009 at 11:01