Closed GoogleCodeExporter closed 8 years ago
Ryan
I am actually working on a proposal for allowing extension libraries (in the
form of OpenCL .cl files + separate java implementation - in case OpenCL not
available) to be added to Aparapi. I will post the suggestion either as an
issue, or possibly as a wiki page (and link here) in the next week or so.
In short the extension implementer would provide a Java interface and a Java
implementation along with a way of mapping .cl source to the interface so that
Aparapi can compile and bind the args (args would use Annotations to help
Aparapi work out the access type). This would allow the OpenCL version to use
vector types, local memory, barriers etc.
WRT to the issue above.
Obviously the cost of bytecode analysis, OpenCL creation and compilation is
only incurred on the first call to a kernel instance. Provided just the data is
changing this cost should not be incurred more than once.
Are you seeing this cost each time you execute? If so this is a bug.
Maybe you are creating multiple instances of the same Kernel. In this case
each will indeed incur the cost of analaysis->code creation and compilation.
It is possible we could share the source creation between instances in this
case, obviously the actual bound args have to be on a per-instance basis.
Can you please elaborate on the use case a little, so I can work out whether
this is a bug or an enhancement?
I will bounce the extension doc proposal of you (and anyone with an interest)
once I have mulled it over for a few days.
Original comment by frost.g...@gmail.com
on 28 Nov 2011 at 4:52
This is an enhancement request and not a bug.
One of the primary use cases we are investigating is a section of code which
calls Aparapi repeatedly, but each time has to re-execute Aparapi, incurring a
large running cost compared to the existing CPU-bound algorithm. For example,
the CPU algorithm takes ~170ms to execute, while Aparapi takes ~300ms of which
only ~8ms of that is actual compute time. If we could avoid the ~292ms of
overhead during production, that would be excellent. Of course, there are
potentially other work-arounds, but this ticket could provide an elegant
solution to that problem.
We also had a request from a collaborator who is interested in this framework,
who asked if he could use Aparapi to generate OpenCL from Java code, but then
have the ability to use only the resultant .cl file afterwards.
Original comment by ryan.lam...@gmail.com
on 28 Nov 2011 at 7:59
Can you not create the Kernel instance once (outside the loop) and
just change the data?
Sorry if I am being slow here.
So instead of :-
for (...){
int []data = //...
// fill data
Kernel kernel = new Kernel(){
public void run(){
// use data[]
}
};
kernel.execute(...);
// use modified data
}
Instead use something like
int []data = //...
Kernel kernel = new Kernel(){
public void run(){
// use data[]
}
};
for (...){
// fill data
kernel.execute(...);
// use modified data
}
Or is there a reason for recreating the Kernel?
Certainly we could dump the OpenCL.
We even had an idea earlier whereby we could tell the JNI layer (via a
property) to output a compiler ready C source file containing the
required buffer/host manipulation code. Kind of like a wrapped C
function that would take just pointers to float/int/arrays int's and
the function's C code for sheparding the args would be generated
automatically. We thought this might make a good unit test. It would
certainly give someone a good starting point.
However, our code generation is very very literal and anyone with even
a few weeks of OpenCL experience would possibly scoff at it from a
performance POV. Maybe scoff is too too strong. Snigger is probably
better ;)
Don't get me wrong, our codegen recreates OpenCL source structure
fairly well from bytecode. But without an autovectorization optimizer
or possibly a loop unroller optimizer our code is fairly naive.
Gary
Original comment by frost.g...@gmail.com
on 28 Nov 2011 at 8:30
Actually, that is almost the exact work-around we are using at the moment :^)
Your code generation is fine right now...the Java developer can also unroll the
loops if needed.
Original comment by ryan.lam...@gmail.com
on 28 Nov 2011 at 8:38
That's good, and actually this is a common pattern. I might need to add a wiki
page covering this.
We might still want to look at a way to at least have the code analysis, code
generation and OpenCL compile kept with the Class rather than with the
instance. This way when we create multiple instances they could share (and
minimize) this overhead.
Do need to be careful with subclasses which inherit from another Kernel...
Original comment by frost.g...@gmail.com
on 28 Nov 2011 at 11:39
Take a look at the proposal for allowing extensions to be added by
developers/third party library providers.
http://code.google.com/p/aparapi/wiki/AparapiExtensionProposal
Original comment by frost.g...@gmail.com
on 2 Dec 2011 at 10:26
This should be marked "closed" since the feature has now been implemented and
is in place.
Original comment by lats...@gmail.com
on 29 Mar 2013 at 11:06
Original comment by ryan.lam...@gmail.com
on 20 Apr 2013 at 12:31
Original issue reported on code.google.com by
ryan.lam...@gmail.com
on 27 Nov 2011 at 7:32