Open GoogleCodeExporter opened 8 years ago
That error is pretty cryptic. Sorry about that.
Do you have 'break' or 'continue' in the code?
Aparapi has issues untangling breaks/continues.
I know that break and continue can be useful, but they can always be replaced
and I would recommend this for Aparapi code.
Basically, the cryptic error message found a goto that it could not associate
with a loop (or else clause)
If you use javap and dissasemble the class you will see a 'goto 362' in your
code. Goto's occur naturally in Java code. For example
if (a){
b
} else{
c
}
turns to
if (!a) -> label1
b
goto label2
label1:
c
label2:
And for loops create either a goto at the top, or at the bottom of the loop
(depending on whose javac compiler you use - Oracle vs Eclipse).
If you don't have a break/continue then please consider sending me your code so
I can try to track the bug. Or you can send me the result of javap on your
kernel. I actually secretly enjoy hand-disassembling bytecode ;)
Gary
Original comment by frost.g...@gmail.com
on 7 Jun 2012 at 10:43
Thank you Gary for these details. I've quite some of these useful break and
continues - will have a look to replace them.
Markus
Original comment by deltrion...@gmail.com
on 8 Jun 2012 at 7:38
I could now remove the breaks and continues which also removed the problem
described here - leading to new problems (for a certain complexity of the
problem):
- On a Win7/32 PC with Nvidia GPU the program could not use the GPU - the
screen flickers a lot and then the implementation falls back on CPU. A video
running in parallel leads to a Blue Screen.
- On a WinXP/64 PC with ATI GPU the Blue Screen appeared immediately without
any parallel running application stress.
Original comment by deltrion...@gmail.com
on 10 Jun 2012 at 1:23
Markus
It sounds like the video driver is crashing in both cases. I have seen this
when I try to throw a lot of compute at the GPU. The fact that it works for
some sets of data and not for others (bigger? more complex?) might mean that
this algorithm is 'just on the edge' of some critical resource.
Here are a few things to consider.
1) Does the algorithm work in non-GPU mode? Maybe JTP (or CPU mode on the
ATI/AMD platform? - NVidia does not support CPU mode).
2) Do you have enough memory on on the video card to execute? It is possible
that there is enough space for buffers to be transferred to GPU and execute to
succeed some of the time, but then when another workload (video decode) is
happening in parallel there is not enough resources. I know this is hard to
diagnose. But the fact that having video decode in another window seems to
effect this seems too-coincidental.
3) Can the compute be broken down further? I have been working on face
detection under Aparapi and I do know that Windows drivers do not really like
'long running' compute to take place. By long running, I mean the compute
takes > 2 seconds. The windows driver does not differentiate between between
graphics and compute, and will consider >2 seconds a problem ;) so will reset
the driver for the 'good of the system'. For my face detection I kept the
data on the GPU and scheduled multiple kernel dispatches (using
kernel.setExplicit(true) then multiple calls to kernel.execute()). You might
try to break the compute up. In my case the execution suffered from 'wave
divergence' (where the code is very branchy or branches are data driven and
multiple threads in the same group are taking different paths - or different
length loops)
If your algorithm is not proprietary, and you would like help I am happy to
give it a look. However, I am presenting at a conference (on Aparapi) next
week and am still getting my lab/slides into shape ;) so I may not be able to
help too much for a week or so.
Gary
Original comment by frost.g...@gmail.com
on 10 Jun 2012 at 2:19
One more thing..
Here is a nice description of the effect of wave divergence on GPUs
http://cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf
Original comment by frost.g...@gmail.com
on 10 Jun 2012 at 2:33
Gary
Yes, the video drivers were guilty in both cases.
to 1) Yes, but maybe it's still to complex - I'm just trying to reduce this.
to 2) How to find out? I'd expect not enough memory leading to some kind of
runtime exception?! The ATI HD 4770 has 512 MB - The JVM/JTP implementation
uses about 70 MB (Heap+Perm) maximum.
to 3) I'm violating the 2 seconds rule. Breaking down the algorithm would
surely be a good idea. Currently I'm using only one kernel which is probably
not very reasonable.
I had the feeling, that using the GPU creates some overhead so I tried to use
it only once. Probably that was no good idea because for the 'smaller' problems
I would not really need the GPU - the overhead takes more time than the CPU
would need in total. But bigger problems should really benefit. So first I'll
have to adapt my thinking on how the problem could become aware of GPU
calculation power. This is my first try programming a GPU - as hobby - so never
mind.
Markus
Original comment by deltrion...@gmail.com
on 10 Jun 2012 at 5:07
Original issue reported on code.google.com by
deltrion...@gmail.com
on 7 Jun 2012 at 9:22