PiRSquared17 / aparapi

Automatically exported from code.google.com/p/aparapi
Other
0 stars 0 forks source link

Device#best bottleneck #126

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi there,

I just run a profiler on my own version of Aparapi with multiple entrypoints. 
When looking at the results, I noted that Device#best took about 30% of the 
execution time in my implementation (wow, what a bottleneck, see appendix for a 
screenshot of the profiler). This issue is pretty simple to solve:

Within the Device class, just add a small static attribute which serves as 
cache for the last Device#best call. As long as the attribute is null, the 
method really has to be invoked. Otherwise, the cache can be simply returned. 
An implementation of the fix can be found here 
(https://github.com/klassm/aparapi-clone/commit/2284acc978b771b98186ea2c3a68e006
c717b6d2#diff-5). However, the diff contains some more things not important for 
you currently.

Using this change, I could reduce my execution time from about 6000ms to just 
1700 :-).

Matthias

Original issue reported on code.google.com by matthias.klass@gmail.com on 31 Jul 2013 at 12:11

Attachments:

GoogleCodeExporter commented 9 years ago
Great idea ;) 

Yes device does have to go through the JNI code each time, and we probably 
could cache the 'last best' device rather than querying devices all over again. 
 This could be problematic if devices are hot-swappable ;) but I think we can 
probably ignore that for most users. 

One use case I can imagine is for platforms where the GPU switches from 
discrete to 'on board' depending on power states. 

Gary  

Original comment by frost.g...@gmail.com on 31 Jul 2013 at 6:06

GoogleCodeExporter commented 9 years ago

Original comment by frost.g...@gmail.com on 7 Aug 2013 at 1:55