Saalma / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Allow Aparapi to load existing OpenCL when available #24

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
For some of our use cases, we've been trying to find ways to avoid the 
expensive initialization costs of using Aparapi.

For example, one of our tests is taking ~8ms to complete the kernel execution, 
but the initial Aparapi execution and OpenCL generation is taking ~250-300ms. 
This cost really adds up over multiple different kernels or even re-executions 
of the same kernel outside of a loop (different execution scopes).

One solution would be the following:

- Allow the user to specify that Aparapi should serialize the generated OpenCL 
code to a local .cl file during regular execution

- Allow the user to specify that Aparapi should deserialize a user-defined .cl 
file instead of generating OpenCL from Java code

- Allow Aparapi to follow all of its existing auto-fallback options if the .cl 
file cannot be found, is invalid, etc.
  - Log an error
  - Revert to existing behavior

Original issue reported on code.google.com by ryan.lam...@gmail.com on 27 Nov 2011 at 7:32

GoogleCodeExporter commented 8 years ago
Ryan

I am actually working on a proposal for allowing extension libraries (in the 
form of OpenCL .cl files + separate java implementation - in case OpenCL not 
available) to be added to Aparapi.  I will post the suggestion either as an 
issue, or possibly as a wiki page (and link here) in the next week or so. 

In short the extension implementer would provide a Java interface and a Java 
implementation along with a way of mapping .cl source to the interface so that 
Aparapi can compile and bind the args (args would use Annotations to help 
Aparapi work out the access type).  This would allow the OpenCL version to use 
vector types, local memory, barriers etc.

WRT to the issue above.
Obviously the cost of bytecode analysis, OpenCL creation and compilation is 
only incurred on the first call to a kernel instance. Provided just the data is 
changing this cost should not be incurred more than once. 

Are you seeing this cost each time you execute? If so this is a bug.

Maybe you are creating multiple instances of the same Kernel.  In this case 
each will indeed incur the cost of analaysis->code creation and compilation.  
It is possible we could share the source creation between instances in this 
case, obviously the actual bound args have to be on a per-instance basis.

Can you please elaborate on the use case a little, so I can work out whether 
this is a bug or an enhancement?

I will bounce the extension doc proposal of you (and anyone with an interest) 
once I have mulled it over for a few days. 

Original comment by frost.g...@gmail.com on 28 Nov 2011 at 4:52

GoogleCodeExporter commented 8 years ago
This is an enhancement request and not a bug.

One of the primary use cases we are investigating is a section of code which 
calls Aparapi repeatedly, but each time has to re-execute Aparapi, incurring a 
large running cost compared to the existing CPU-bound algorithm. For example, 
the CPU algorithm takes ~170ms to execute, while Aparapi takes ~300ms of which 
only ~8ms of that is actual compute time. If we could avoid the ~292ms of 
overhead during production, that would be excellent. Of course, there are 
potentially other work-arounds, but this ticket could provide an elegant 
solution to that problem.

We also had a request from a collaborator who is interested in this framework, 
who asked if he could use Aparapi to generate OpenCL from Java code, but then 
have the ability to use only the resultant .cl file afterwards.

Original comment by ryan.lam...@gmail.com on 28 Nov 2011 at 7:59

GoogleCodeExporter commented 8 years ago
Can you not create the Kernel instance once (outside the loop) and
just change the data?

Sorry if I am being slow here.

So instead of :-

for (...){
   int []data = //...
   // fill data
   Kernel kernel = new Kernel(){
       public void run(){
         // use data[]
       }
    };
    kernel.execute(...);
    // use modified data
}

Instead use something like

int []data = //...
Kernel kernel = new Kernel(){
    public void run(){
      // use data[]
    }
};

for (...){
   // fill data
    kernel.execute(...);
    // use modified data
}

Or is there a reason for recreating the Kernel?

Certainly we could dump the OpenCL.

We even had an idea earlier whereby we could tell the JNI layer (via a
property) to output a compiler ready C source file containing the
required buffer/host manipulation code.  Kind of like a wrapped C
function that would take just pointers to float/int/arrays int's and
the function's C code for sheparding the args would be generated
automatically.  We thought this might make a good unit test.  It would
certainly give someone a good starting point.

However, our code generation is very very literal and anyone with even
a few weeks of OpenCL experience would possibly scoff at it from a
performance POV.  Maybe scoff is too too strong. Snigger is probably
better ;)

Don't get me wrong, our codegen recreates OpenCL source structure
fairly well from bytecode.  But without an autovectorization optimizer
or possibly a loop unroller optimizer our code is fairly naive.

Gary

Original comment by frost.g...@gmail.com on 28 Nov 2011 at 8:30

GoogleCodeExporter commented 8 years ago
Actually, that is almost the exact work-around we are using at the moment :^)

Your code generation is fine right now...the Java developer can also unroll the 
loops if needed.

Original comment by ryan.lam...@gmail.com on 28 Nov 2011 at 8:38

GoogleCodeExporter commented 8 years ago
That's good, and actually this is a common pattern.  I might need to add a wiki 
page covering this.   

We might still want to look at a way to at least have the code analysis, code 
generation  and OpenCL compile kept with the Class rather than with the 
instance. This way when we create multiple instances they could share (and 
minimize) this overhead.  

Do need to be careful with subclasses which inherit from another Kernel... 

Original comment by frost.g...@gmail.com on 28 Nov 2011 at 11:39

GoogleCodeExporter commented 8 years ago
Take a look at the proposal for allowing extensions to be added by 
developers/third party library providers. 

http://code.google.com/p/aparapi/wiki/AparapiExtensionProposal

Original comment by frost.g...@gmail.com on 2 Dec 2011 at 10:26

GoogleCodeExporter commented 8 years ago
This should be marked "closed" since the feature has now been implemented and 
is in place.

Original comment by lats...@gmail.com on 29 Mar 2013 at 11:06

GoogleCodeExporter commented 8 years ago

Original comment by ryan.lam...@gmail.com on 20 Apr 2013 at 12:31