cdarnab / aparapi

Automatically exported from code.google.com/p/aparapi
0 stars 0 forks source link

Fall back with CodeGenException goto -> 0362 #54

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
I've developed an algorithm (to solve polyominoes) for which the GPU code could 
be generated.

What is the expected output? 
I expected the algorithm to generate code and run on the GPU.

What do you see instead?
07.06.2012 22:40:30 com.amd.aparapi.KernelRunner warnFallBackAndExecute
WARNUNG: Reverting to Java Thread Pool (JTP) for class 
poly.omino.ReservationIntern: goto -> 0362
com.amd.aparapi.CodeGenException: goto -> 0362
    at com.amd.aparapi.BlockWriter.writeInstruction(BlockWriter.java:664)
    at com.amd.aparapi.KernelWriter.writeInstruction(KernelWriter.java:631)
    at com.amd.aparapi.BlockWriter.writeSequence(BlockWriter.java:285)
    at com.amd.aparapi.BlockWriter.writeBlock(BlockWriter.java:296)
    at com.amd.aparapi.BlockWriter.writeMethodBody(BlockWriter.java:705)
    at com.amd.aparapi.KernelWriter.write(KernelWriter.java:545)
    at com.amd.aparapi.KernelWriter.writeToString(KernelWriter.java:643)
    at com.amd.aparapi.KernelRunner.execute(KernelRunner.java:1396)
    at com.amd.aparapi.Kernel.execute(Kernel.java:1719)
    at com.amd.aparapi.Kernel.execute(Kernel.java:1650)
    at com.amd.aparapi.Kernel.execute(Kernel.java:1635)
    at poly.omino.Solver1$4.run(Solver1.java:405)
    at java.lang.Thread.run(Unknown Source)

What version of the product are you using? On what operating system?
aparapi-2012-05-06
aparapi-2012-02-15
Windows 7, 32bit

Please provide any additional information below.

Is there any possibility to get more detailed information about the cource line 
that issues a code generation problem? I managed to fix all ClassParseExeptions 
by their hintful explanations (additional line numbers of the sources would 
have been very helpful either). But now I get a CodeGenException with no 
additional information about where probably to modify the sources.

Could a CodeGenException be platform specific, that means depending on the 
actual hardware?

Is there any special way recommended to tweak or debug the code generation?

Could an algorithm become to complex?

Original issue reported on code.google.com by deltrion...@gmail.com on 7 Jun 2012 at 9:22

GoogleCodeExporter commented 8 years ago
That error is pretty cryptic.  Sorry about that.

Do you have 'break' or 'continue' in the code?

Aparapi has issues untangling breaks/continues.  

I know that break and continue can be useful, but they can always be replaced 
and I would recommend this for Aparapi code.  

Basically, the cryptic error message found a goto that it could not associate 
with a loop (or else clause) 

If you use javap and dissasemble the class you will see a 'goto 362' in your 
code.  Goto's occur naturally in Java code.  For example 

if (a){
    b
} else{
   c
}

turns to 
          if (!a) -> label1 
          b
          goto label2
label1:
          c
label2:

And for loops create either a goto at the top, or at the bottom of the loop 
(depending on whose javac compiler you use - Oracle vs Eclipse). 

If you don't have a break/continue then please consider sending me your code so 
I can try to track the bug. Or you can send me the result of javap on your 
kernel.  I actually secretly enjoy hand-disassembling bytecode ;) 

Gary

Original comment by frost.g...@gmail.com on 7 Jun 2012 at 10:43

GoogleCodeExporter commented 8 years ago
Thank you Gary for these details. I've quite some of these useful break and 
continues - will have a look to replace them.

Markus

Original comment by deltrion...@gmail.com on 8 Jun 2012 at 7:38

GoogleCodeExporter commented 8 years ago
I could now remove the breaks and continues which also removed the problem 
described here - leading to new problems (for a certain complexity of the 
problem):
- On a Win7/32 PC with Nvidia GPU the program could not use the GPU - the 
screen flickers a lot and then the implementation falls back on CPU. A video 
running in parallel leads to a Blue Screen.
- On a WinXP/64 PC with ATI GPU the Blue Screen appeared immediately without 
any parallel running application stress.

Original comment by deltrion...@gmail.com on 10 Jun 2012 at 1:23

GoogleCodeExporter commented 8 years ago
Markus

It sounds like the video driver is crashing in both cases. I have seen this 
when I try to throw a lot of compute at the GPU. The fact that it works for 
some sets of data and not for others (bigger? more complex?) might mean that 
this algorithm is 'just on the edge' of some critical resource.

Here are a few things to consider. 
1) Does the algorithm work in non-GPU mode?  Maybe JTP (or CPU mode on the 
ATI/AMD platform? - NVidia does not support CPU mode).

2) Do you have enough memory on on the video card to execute?  It is possible 
that there is enough space for buffers to be transferred to GPU and execute to 
succeed some of the time, but then when another workload (video decode) is 
happening in parallel there is not enough resources.  I know this is hard to 
diagnose. But the fact that having video decode in another window seems to 
effect this seems too-coincidental.

3) Can the compute be broken down further?  I have been working on face 
detection under Aparapi and I do know that Windows drivers do not really like 
'long running' compute to take place.  By long running, I mean the compute 
takes > 2 seconds. The windows driver does not differentiate between between 
graphics and compute, and will consider >2 seconds a problem ;) so will reset 
the driver for the 'good of the system'.   For my face detection I kept the 
data on the GPU and scheduled multiple kernel dispatches (using 
kernel.setExplicit(true) then multiple calls to kernel.execute()). You might 
try to break the compute up.  In my case the execution suffered from 'wave 
divergence' (where the code is very branchy or branches are data driven and 
multiple threads in the same group are taking different paths - or different 
length loops)

If your algorithm is not proprietary, and you would like help I am happy to 
give it a look.  However, I am presenting at a conference (on Aparapi) next 
week and am still getting my lab/slides into shape ;) so I may not be able to 
help too much for a week or so.

Gary

Original comment by frost.g...@gmail.com on 10 Jun 2012 at 2:19

GoogleCodeExporter commented 8 years ago
One more thing..

Here is a nice description of the effect of wave divergence on GPUs

http://cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf

Original comment by frost.g...@gmail.com on 10 Jun 2012 at 2:33

GoogleCodeExporter commented 8 years ago
Gary

Yes, the video drivers were guilty in both cases.

to 1) Yes, but maybe it's still to complex -  I'm just trying to reduce this.
to 2) How to find out? I'd expect not enough memory leading to some kind of 
runtime exception?! The ATI HD 4770 has 512 MB - The JVM/JTP implementation 
uses about 70 MB (Heap+Perm) maximum.
to 3) I'm violating the 2 seconds rule. Breaking down the algorithm would 
surely be a good idea. Currently I'm using only one kernel which is probably 
not very reasonable.

I had the feeling, that using the GPU creates some overhead so I tried to use 
it only once. Probably that was no good idea because for the 'smaller' problems 
I would not really need the GPU - the overhead takes more time than the CPU 
would need in total. But bigger problems should really benefit. So first I'll 
have to adapt my thinking on how the problem could become aware of GPU 
calculation power. This is my first try programming a GPU - as hobby - so never 
mind. 

Markus

Original comment by deltrion...@gmail.com on 10 Jun 2012 at 5:07