Open GoogleCodeExporter opened 9 years ago
Ragged/Jagged arrays would be hard, but as you elude it might be possible to
offer support for rectangular arrays.
A few things to worry about - I am on vacation without source access - well I
have access but painful to read from my phone ;)
So first we will need to allow the bytecode through, this probably just means
we just remove the code check in MethodModel.
We also need a strategy for dealing with the fact that the array of arrays of
primitive are not in one contiguous region. One option (slow) would be to copy
the data to a single array. This might allow a quick hack, but would be
expensive. Alternatively we could mimic the Java array of arrays in OpenCL
land. This would be best I think.
Of course with a single dimension array we need to pin the array to stop Java's
GC from moving the array that we passed to the GPU, we generally don't need to
worry about the array being collected because (we assume) that there is a live
reference from the java code. With a two dimensional array we would need to
pin the array of arrays and then loop through all of the arrays and pin them.
This is clearly possible, but it would require us to track the array
information from the JNI side. Definately look in the aparapi.cpp code to see
how we handle the pinning.
I would try this my creating test code and then just stepping through existing
code and fixing code parsing (should be fine), bytecode folding (should be OK
but you might need help with a transform - my guess is this bytecode creates
lots of DUP instructions which require transforms - I would be happy to help
with this), then code creation and JNI changes.
It would be a fun project and I think would be dooable.
Gary
Original comment by frost.g...@gmail.com
on 28 Oct 2011 at 3:11
We have experienced the same problem, using long[][] which results in a NPE.
Our solution for now is to create a single long[] outside of Aparapi and manage
the data transfer via explicit mode, but this is obviously non-ideal.
Original comment by ryan.lam...@gmail.com
on 4 Nov 2011 at 7:00
I have checked in changes to enable detection of accesses through
multidimensional arrays. This was not detected previously until we attempted
to create OpenCL. Now we should be able to detect these as we fold
expressions.
Added two junit codegen tests (Access2DIntArray and Assign2DIntArray)
Original comment by frost.g...@gmail.com
on 7 Nov 2011 at 7:20
Keeping this open until the enhancement itself is implemented.
Original comment by frost.g...@gmail.com
on 7 Nov 2011 at 7:23
Excellent, we look forward to testing it out
Original comment by ryan.lam...@gmail.com
on 8 Nov 2011 at 5:01
I have verified with the latest trunk code that double arrays are being
detected during OpenCL creation.
In addition to the NPE that is still thrown as described above, if you try to
save to a double array within your kernel a new exception is thrown:
com.amd.aparapi.ClassParseException: Can't assign to two dimension array
Is this ticket intended to support both use cases?
Original comment by ryan.lam...@gmail.com
on 13 Nov 2011 at 6:40
[deleted comment]
Just a thought, couldn't Aparapi just copy all of the double/triple/etc. arrays
into a single array wrapped in an NIO buffer and then use the NIO buffer array
with the GPU? I believe this is how frameworks like JNA deal with
multi-dimensional arrays. Not ideal, but should work as expected.
Original comment by ryan.lam...@gmail.com
on 14 Feb 2012 at 9:47
We can do this. Tom (in the discussion list) did this for 1D buffers
and this approach would work for 2D/3D/nD ;) but sacrificed array
access syntax (arr[i]) for explicit method access arrBuf.get(i).
This approach is probably the most likely to give performance.
Anytime we take arr[][] or arr[][][] and consider copying behind the
scenes we have a lot of copying to do an a lot of JNI based array
accesses to do the copying.
But (as you indicated earlier) I really am only 'assuming' this would
be slower (based on #'s from previous JNI/array work) so it would be
interesting just to measure the cost.
Gary
Original comment by frost.g...@gmail.com
on 14 Feb 2012 at 10:02
We can do this. Tom (in the discussion list) did this for 1D buffers
and this approach would work for 2D/3D/nD ;) but sacrificed array
access syntax (arr[i]) for explicit method access arrBuf.get(i).
Tom's approach is probably the most likely to give performance, because we can
guarantee contiguous memory accesses.
Anytime we take arr[][] or arr[][][] and consider copying behind the
scenes we have a lot of copying to do an a lot of JNI based array
accesses to do the copying.
But (as you indicated earlier) I really am only 'assuming' this would
be slower (based on #'s from previous JNI/array work) so it would be
interesting just to measure the cost.
If you had an example application that you would like to test maybe between us
we can hack up a version of Aparapi which would do this.
Gary
Original comment by frost.g...@gmail.com
on 14 Feb 2012 at 10:04
Here is some interesting example code for the following:
http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jnistring.html#
pin
http://java.sun.com/developer/onlineTraining/Programming/JDCBook/jnistring.html#
multi
Those two examples together should be able to allow us to use arrays of arrays?
Original comment by ryan.lam...@gmail.com
on 3 Mar 2012 at 4:33
We have checked-in a branch that resolves multidimensional array support for
1D, 2D and 3D arrays.
There is OpenMP support added to the source, but commented out at the moment.
The OpenMP code has been tested successfully on Windows 7 and OS X.
This code specifically addresses our Java heap memory usage issues with forced
1D array creation in Java code, when we have data that initially starts as
multidimensional arrays and those arrays may be wrapped in other objects. This
branch moves the creation and copying of Java arrays to native code, so heap
memory pressure is significantly alleviated or eliminated.
Original comment by ryan.lam...@gmail.com
on 9 Apr 2013 at 5:42
In the current Trunk code, I am working on merging the multi-dimensional array
branch. But I am also still experimenting with OpenMP. We have it working in
the branch, but I'm debating if it is the most efficient means of using
OpenMP...
I'm looking at ways to use AttachCurrentThread() inside of the OpenMP parallel
for loops, do you have a suggestion (based on the current Trunk code) where the
best place would be to cache a global pointed to JavaVM *jvm ?
Original comment by ryan.lam...@gmail.com
on 15 Apr 2013 at 8:00
I think it would be appropriate to cache this in each JNIContext.
I think it is fairly cheap to extract from JNIEnv using
jniEnv->GetJavaVM(JavaVM **vm);
For the upcoming 'lambda branch' where we have to run as an agent. We can grab
a copy when the agent is loaded.
Original comment by frost.g...@gmail.com
on 15 Apr 2013 at 8:54
All relevant code has been merged from the MultiDimensionalArray Branch to
Trunk. A few of the notable changes:
- There is a new AparapiBuffer class that is used for managing arrays
- OpenMP+JNI friendly code has been included but commented out until I can
verify the most performant way to implement this feature (others are welcome to
investigate this too)
- The arrays are limited to 1D, 2D and 3D to match current Range limitations
and to avoid excessive new code to support n-dimensional arrays
- Array copying is now performed in C++ to avoid Java heap memory problems due
to generation and copying of Java arrays multiple times in Java code required
to generate 1D arrays needed exclusively for GPU usage
- There is a new project in /samples/mdarray that was used to comprehensively
test the new multi-dimensional feature. While we are positive there will most
likely be edge cases we didn't test, we should have covered most of the common
use cases.
- Any and all input, code review and changes (if necessary) are welcome
Original comment by ryan.lam...@gmail.com
on 19 Apr 2013 at 8:17
It appears we missed kernel.put() and kernel.get() for multi-dimensional array
support. Bummer.
Gary, do you have any time to add this missing piece? I'll try to take a look
at it as soon as I can.
Original comment by ryan.lam...@gmail.com
on 19 Apr 2013 at 8:43
Okay, that may have been easier than I first thought. I just checked-in an
initial pass at adding the missing method parameters. Will need more testing.
Original comment by ryan.lam...@gmail.com
on 19 Apr 2013 at 8:51
Nice work Ryan.
I just saw your ask for help with kernel.get()/put() then (almost
immediately) you had the fix. Looks like you found it ;)
I will take the trunk for a spin this weekend. Thank you (and Steven ) for
this effort. I know I have been a 'slacko' on this.
I will test on Win64+Linux64+MacOS this weekend.
Original comment by frost.g...@gmail.com
on 19 Apr 2013 at 9:04
Ryan. Compilation failed for me on (Win64)
LINK : fatal error LNK1104: cannot open file 'VCOMP.lib'
Original comment by frost.g...@gmail.com
on 19 Apr 2013 at 9:11
So I think this is required for "Visual Studio" builds using OpenMP.
I looked around briefly, but do not have this. I am using Visual Studio
Express (always build with this in the past).
Where is you VComp.lib ?
Gary
Original comment by frost.g...@gmail.com
on 19 Apr 2013 at 9:14
Hmm, I am using VS Professional 2012. VS Express...MS leaves out a lot of
important capabilities in the Express edition, unfortunately. Is using VS
Express a requirement for Aparapi?
It is a simple build.xml change to disable OpenMP support, I don't have it open
right now but you can simply remove/comment out <arg value="/openmp" />.
If you can't get to it today, I can try and check in the change tonight.
Original comment by ryan.lam...@gmail.com
on 19 Apr 2013 at 10:26
I checked in the commented-out configuration(s) in build.xml. Let me know if
you encounter any other issues.
Original comment by ryan.lam...@gmail.com
on 20 Apr 2013 at 12:25
Ryan.
Worked like a charm on Linux 64. I added a mdarray.sh.
Regarding your 'VS express question' (#21 above). I won't say that VS Express
is a requirement, but I also would hate to force developers to 'have to have'
professional version of VS to contribute to Aparapi.
I for example have VS Express at home, and that is where I tend to do my
Aparapi builds.
What happens if we build against this Vcomp lib and create aparapi_x86_64.dll
as part of a binary distribution. Is the 'user' going to need to have a
dynamic version of this? Maybe my OpenMP naivity is showing here.
Can we try this. Can you build aparapi_x86_64.dll for me (using your OpenMP
'enabled' VS environment). I will then run on a machine which only has 'basic'
runtime support. I want to make sure there is nothing in aparapi_x86_64.dll
from your env which will stop me from using it.
I guess I could use http://www.dependencywalker.com/ to check this.
Original comment by frost.g...@gmail.com
on 21 Apr 2013 at 2:50
Gary,
I can do that and attach the output to this ticket, hopefully tomorrow when I
get back to work.
As far as developers building Aparapi is concerned, this is just a compiler
directive change, therefore we could possibly have something as simple as "MSVC
OpenMP" and "MSVC Non-OpenMP" Ant targets. This problem really only affects
Windows as GCC has supported OpenMP since 4.2. Either way, the actual C++ code
does not have to change, since the OpenMP pragmas are ignored by non-OpenMP
enabled compilers.
http://gcc.gnu.org/wiki/openmp
Original comment by ryan.lam...@gmail.com
on 21 Apr 2013 at 9:45
It is possible users may need to install Visual C++ Redistributable for Visual
Studio 2012 Update 1:
http://www.microsoft.com/en-us/download/details.aspx?id=30679
"The Visual C++ Redistributable Packages install runtime components of Visual
C++ libraries that are required to run applications developed using Visual
Studio 2012 on a computer that does not have Visual Studio 2012 installed.
These packages install runtime components of the C Runtime (CRT), Standard C++,
ATL, MFC, C++ AMP, and OpenMP libraries."
Original comment by ryan.lam...@gmail.com
on 23 Apr 2013 at 6:23
I prefer building two versions aparapi_win_x86_64_openmp.zip and
aparapi_win_x86_64.zip and letting the user choose.
We could even put two versions of the dll in the same zip and finding some
way to select the appropriate one at runtime.
Gary
Original comment by frost.g...@gmail.com
on 23 Apr 2013 at 6:29
Gary,
I have attached a .dll that was built with Visual Studio 2012 Professional,
Windows SDK 7.1A and OpenMP enabled. Let me know if you are able to execute it
on your machine.
In the process, I have also cleaned-up the Trunk build.xml to query for SDKs
properly when a machine has multiple SDKs installed :) I will check-in those
changes shortly. Only affects Windows builds at this time.
Original comment by ryan.lam...@gmail.com
on 30 Apr 2013 at 7:23
Attachments:
Also, let me put together a DLL with all of the OpenMP pragmas uncommented in
AparapiBuffer.cpp so we can test that as well.
Original comment by ryan.lam...@gmail.com
on 30 Apr 2013 at 7:59
Try this DLL instead
Original comment by ryan.lam...@gmail.com
on 1 May 2013 at 1:02
Attachments:
Any results Gary?
Original comment by ryan.lam...@gmail.com
on 8 May 2013 at 5:18
Can we mark this issue complete? Is the only task left to update the Wiki/FAQ?
Original comment by pnnl.edg...@gmail.com
on 1 Jun 2013 at 8:25
Hi,
This is very nice.
When using Multi-Dimension Array in "run()" method, the program crashes by
calling "execute(N*N)" method twice or more.
Is there any solution in order to call "execute()" method multiple times?
Thanks.
Original comment by fukusui...@gmail.com
on 9 Jun 2013 at 3:57
Do you have a code example?
Original comment by pnnl.edg...@gmail.com
on 9 Jun 2013 at 10:54
Thank you for your response.
Simply I call "execute(N*N)" twice in your sample code "mdarray".
Like this.
public static void Zrun2D() {
....
long gs = System.currentTimeMillis();
final Kernel kernel = new ZMatMul2D(A, B, gpu, N);
kernel.execute(N * N);
//Do something.
kernel.execute(N * N);
gs = System.currentTimeMillis() - gs;
....
}
Original comment by fukusui...@gmail.com
on 10 Jun 2013 at 12:17
Gary,
I think the only task left, besides investigating Licht Takeuchi's potential
bug post, is to let me know if the OpenMP version works for you on Windows 7
64-bit.
-Ryan
Original comment by pnnl.edg...@gmail.com
on 12 Jul 2013 at 10:23
I am sorry I have been tardy on this. Yes i can confirm that the OpenMP
version does work (well does not fail!) for me on Win 7.
Sorry Ryan.
I think we can close this as done!
Gary
Original comment by frost.g...@gmail.com
on 12 Jul 2013 at 11:29
No problem, I thought you had maybe forgotten about this or I had pestered you
too much regarding the OpenMP testing.
The code that is currently checked into the respository is NOT OpenMP enabled.
The OpenMP code and configuration is currently commented-out. The code on my
local machine is fully OpenMP enabled.
But, there are still some questions I have that I need your input:
- OpenMP does not necessarily speed up loops unless they are large...greater
than a million iterations or taking longer than 15ms, for example.
http://msdn.microsoft.com/en-us/library/fw509c3b.aspx
"Assuming an x64, single core, dual processor the threadpool takes about 16ms
to startup. After that though there is very little cost for the threadpool.
When you compile with /openmp, the second call to test2 never runs any longer
than if you compile with /openmp-, as there is no threadpool startup. At a
million iterations the /openmp version is faster than the /openmp- version for
the second call to test2, and at 25 iterations both /openmp- and /openmp
versions register less than the clock granularity.
So if you have only one loop in your application and it runs in less than 15ms
(adjusted for the approximate overhead on your machine), /openmp may not be
appropriate, but if it's anything more than that, you may want to consider
using /openmp."
- If we do want to enable OpenMP for large loops (I believe we'd have to test
all 3 dimensions to see if any of them qualifies or qualify collectively) and
disable it for smaller loops, then I/we will need to add code to the JNI
looping sections to enable this check.
Original comment by pnnl.edg...@gmail.com
on 14 Jul 2013 at 10:55
We can also exercise some of this functionality to test for OpenMP as well:
http://msdn.microsoft.com/en-us/magazine/cc163717.aspx
"When the /openmp switch is invoked, the compiler defines the symbol _OPENMP,
which may be used to detect when OpenMP is enabled using #ifndef _OPENMP."
I believe this is also consistent with how GNU GCC works as well.
Original comment by pnnl.edg...@gmail.com
on 14 Jul 2013 at 11:50
Original issue reported on code.google.com by
kenneth@hexad.dk
on 28 Oct 2011 at 8:20