datacute / aforge

Automatically exported from code.google.com/p/aforge
Other
0 stars 0 forks source link

Patch for /trunk/Sources/Neuro/Neurons/ActivationNeuron.cs #272

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is only a minor suggestion. In CLR in general, the JIT compiler is able to 
perform bounds-check elimination whenever it encounters a iteration over an 
array given in standard form (i.e. as the patch suggests). I know this seems to 
be a micro-optimization, or even counter-intuitive, but it really makes 
difference on the native code generated. As bounds-checks are eliminated, 
performance goes closer to the performance of raw array access in unmanaged 
code. I would also say it improves for readability.

There are many other places across AForge which could be replaced by the 
standard form. I would also suggest exposing a Layer's Neurons property as 
public get / protected set, instead of exposing the field neurons as only 
protected. Currently, one would need to use the this[] accessor to iterate over 
the Layer's neurons from a learning algorithm, which is not very efficient. 
Exposing a public property for the Neuron's array would make it possible to 
iterate over the array directly, also taking advantage of the bounds-check 
elimination.

Well, this is just an idea. I am using Google Code's editor to submit this 
small suggestion so I can only indicate changes on one file at a time, but if 
you feel the modifications I suggested also make sense I could indicate other 
files as well.

Regards,
Cesar

Original issue reported on code.google.com by cesarso...@gmail.com on 28 Dec 2011 at 5:39

Attachments:

GoogleCodeExporter commented 9 years ago
Hello Cesar,

It would be nice to get some more comments about the suggested patch. As I can 
see the only code change is this one:
-            for ( int i = 0; i < inputsCount; i++ )
+            for ( int i = 0; i < input.Length; i++ )
By making small test and inspecting its IL code using ILDASM, we can see that 
the second approach just adds two extra commands (see attachment), which don't 
look making code faster. Could you please comment your suggestion?

>> I would also suggest exposing a Layer's Neurons property as
>> public get / protected set,
This seems to be doable.

Original comment by andrew.k...@gmail.com on 3 Jan 2012 at 10:49

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Andrew,

Almost no optimizations are done in IL level. The real optimizations are almost 
always done in the JIT level. Consider, for example, the following snippets of 
code (please ignore the Debugger.Break instruction, it is there to enable 
debugging of the optimized native x86 code):

        static void loop1(int[] array)
        {
            System.Diagnostics.Debugger.Break();

            for (int i = 0; i < array.Length; i++)
                array[i] += i;
        }

Now consider the following code:

        static void loop2(int[] array)
        {
            System.Diagnostics.Debugger.Break();
            int n = array.Length;
            for (int i = 0; i < n; i++)
                array[i] += i;
        }

Intuition would say the second snippet is fast, as we are caching the size 
variable. However, an inspection of the x86 code generated reveals why this 
wouldn't be the case:

Here is the code generated for the cached size version:

00000000  push        ebp  
00000001  mov         ebp,esp 
00000003  push        edi  
00000004  push        esi  
00000005  mov         esi,ecx 
00000007  call        5FB2D5A4 
0000000c  mov         edi,dword ptr [esi+4] 
0000000f  xor         edx,edx 
00000011  test        edi,edi 
00000013  jle         00000027 
00000015  mov         ecx,dword ptr [esi+4] 
00000018  cmp         edx,ecx 
0000001a  jae         0000002B 
0000001c  lea         eax,[esi+edx*4+8] 
00000020  add         dword ptr [eax],edx 
00000022  inc         edx  
00000023  cmp         edx,edi 
00000025  jl          00000018 
00000027  pop         esi  
00000028  pop         edi  
00000029  pop         ebp  
0000002a  ret              
0000002b  call        60182EF4 
00000030  int         3    

Note the block starting at 0000002b. It is solely devoted to throwing the 
possible out-of-range exception in case the jae instruction at 0000001a detects 
the index is out of the allowed range.

Now, consider the code generated for the direct version, with no intermediate 
caching:

00000000  push        ebp  
00000001  mov         ebp,esp 
00000003  push        esi  
00000004  mov         esi,ecx 
00000006  call        5FB2D5DC 
0000000b  xor         edx,edx 
0000000d  mov         ecx,dword ptr [esi+4] 
00000010  test        ecx,ecx 
00000012  jle         0000001F 
00000014  lea         eax,[esi+edx*4+8] 
00000018  add         dword ptr [eax],edx 
0000001a  inc         edx  
0000001b  cmp         ecx,edx 
0000001d  jg          00000014 
0000001f  pop         esi  
00000020  pop         ebp  
00000021  ret             

As you can see, the native code does not even include an exception throwing 
block and thus neither a jae instruction. It does not needs to do any array 
bounds checking because the compiler can trust the array.Length to have the 
correct length for the array.

For more information we could take a look on the CLR code generation blog post 
at 
http://blogs.msdn.com/b/clrcodegeneration/archive/2009/08/13/array-bounds-check-
elimination-in-the-clr.aspx

Well, I hope the suggestion makes sense now!

Regards,
Cesar

Original comment by cesarso...@gmail.com on 3 Jan 2012 at 3:19

GoogleCodeExporter commented 9 years ago
Hi Andrew,

I actually took the time to update the entire Neuro project to use direct array 
computations. I've loaded the two projects (the original/current and the 
updated/optimized version) side-by-side and did some comparisons. Here are the 
numbers for the particular test I've done:

Optimized: 00:00:01.5514587
Optimized: 00:00:01.1486776
Optimized: 00:00:01.4274900
Optimized: 00:00:01.1395174
Optimized: 00:00:01.4854211

Current  : 00:00:02.6056515
Current  : 00:00:02.1727088
Current  : 00:00:02.5613240
Current  : 00:00:01.9834494
Current  : 00:00:01.9183204

Of course it doesn't differ by much, but we are talking about a very simple 
test problem here. Considering very large problems it could make a noticeable 
difference. And I suppose that optimizations which also improve for code 
clarity and code readability wouldn't hurt after all! :) 

I just didn't attach a patch right now because I am having problems after a 
TortoiseSVN update. If you wish I can send the test project/compressed Neuro 
folder by mail. I will also try to generate the patch on another computer and 
update this issue later, if you could wait.

Regards,
Cesar

Original comment by cesarso...@gmail.com on 3 Jan 2012 at 7:30

GoogleCodeExporter commented 9 years ago
Here it is.

Original comment by cesarso...@gmail.com on 3 Jan 2012 at 8:17

Attachments:

GoogleCodeExporter commented 9 years ago
OK, accepted.

Original comment by andrew.k...@gmail.com on 4 Jan 2012 at 9:51

GoogleCodeExporter commented 9 years ago
1) Exposed Neuron.Weights, Layer.Neurons, Network.Layers properties instead of 
using [] accessors.
2) Optimized access to arrays to eliminate bound-checking.

Committed in revision 1655. Will be released 2.2.4

P.S. The provided patch is sample of good and bad fix at the same time. Good 
– an improvement was introduced. Bad – sample applications stopped building.

Original comment by andrew.k...@gmail.com on 12 Jan 2012 at 10:35

GoogleCodeExporter commented 9 years ago
Sorry about that, completely forgot about the sample applications :)

Original comment by cesarso...@gmail.com on 12 Jan 2012 at 10:55

GoogleCodeExporter commented 9 years ago
NP, fixed those anyway. Just a note for the future ;)

Original comment by andrew.k...@gmail.com on 12 Jan 2012 at 11:00

GoogleCodeExporter commented 9 years ago

Original comment by andrew.k...@gmail.com on 23 Feb 2012 at 9:11