ginesvengeance / open-rp

Automatically exported from code.google.com/p/open-rp
0 stars 0 forks source link

Performance/usability is bad on N900/Maemo #26

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
*What steps will reproduce the problem?
1. Try running the latest SVN release of ORP on the Nokia N900
2. build available at http://kakaroto.homelinux.net/~kakaroto/n900/orp/

*What is the expected output? What do you see instead?
I expect to see ORP working correctly at a reasonable speed. But it is 
extremely slow, it uses up all of the CPU, and the framerate is way lower 
than 1 fps.
Also, input can take up to 10+ seconds to arrive on the PS3 side. 
Sometimes, it doesn't even work, I would have to open ORP, send an input 
(left/right for example), close ORP and reopen it to be able to see the 
result of that input.

*What version of the product are you using? On what operating system?
SVN r326, on Maemo (Linux). 

*Please provide any additional information below.
It looks like the SDL renderer is taking all this time. it could be the 
decoding, but I doubt it considering the resolution.
The N900 is a 600MHz ARM Cortex A8 processor, and is very fast. The device 
can play flawlessly 720p H264 videos. so I'm guessing that maybe the SDL 
renderer is causing the issue, but we can't really know without proper 
profiling.
If you need more info/help from my side, just tell me.

Original issue reported on code.google.com by snifikino on 6 Jan 2010 at 6:16

GoogleCodeExporter commented 8 years ago
Same problem on the iPhone.

BTW, the device can only play 720p videos because the decoding is done on 
dedicated hardware. Video decoding is very processor intensive, especially due 
to the 
fact that ffmpeg is poorly optimized for embedded devices.

Do you plan on doing any performance profiling? I'm curious to see what 
specific 
parts of the code is an issue. I have not found rendering to be /that/ big of a 
bottleneck, but rather the decoding of the input streams and general 
un-streemlined 
interaction between threads seems to take up most of the CPU time. Even so, 
I've 
gotten it to run at about 50% speed, and there doesn't seem to be anything more 
to 
specifically optimize.

In the long run, it seems a fork/rewrite is in order, rearranging everything to 
make it 
more thread-friendly and stripping out SDL and possibly libcurl completely.

Original comment by niel...@gmail.com on 6 Jan 2010 at 2:48

GoogleCodeExporter commented 8 years ago
Yep yep, I know the decoding is done by dedicated hardware, I've worked on the 
gst-
dsp elements (a little) so I know.. :) but I also know the decoding shouldn't 
be an 
issue because the N800 and N810 (330MHz CPUs iirc) were doing video decoding on 
the 
CPU, not on the DSP (the DSP was used for the decoding of audio) and it was 
fine.
Also, if you install mplayer on your N900 and try it out, it should work very 
good, 
and that's using the same ffmpeg as ORP uses.
If I find some time later today, I'll try to run oprofile on that and see what 
happens

I definitely agree with the rewrite/refactoring. I was totally shocked to see 
the 
makefile and the forced compilation of dependencies (zlib, libpng, freetype, 
curl, 
faad2, SDL, ffmpeg, openssl, wxWidgets, etc..) although they are already 
available 
on the platform.. and I didn't like the static linking of everything either.. 
no ./
configure, no Makefile.am.. it's far from a standard source package...
Also, the UI definitely needs some work because I had to disable some fields in 
the 
edit profile window just to be able to see the save button.. no scrollbar...
I'd suggest using gtk for the new UI, this way it wouldn't require much work to 
port 
it to a hildonized, small screen/high dpi, finger friendly UI... 
I would also suggest maybe also using gstreamer for the decoding/rendering as 
it 
would make it so much easier.. and using gstreamer would allow ORP to use the 
DSP 
hardware simply by letting gstreamer choose the best h264 decoder for the 
stream 
(gstdspvdec instead of ffdec_h264). I don't know the internal workings of ORP, 
but a 
simple gstreamer pipeline could be built with 10 lines of code : "appsrc ! 
video/x-
h264,width=320,heigh=240 ! decodebin ! xvimagesink" and ORP could just feed it 
raw 
data through the 'appsrc' video source.
no SDL, no ffmepg involved, it's magic! :)

Original comment by snifikino on 6 Jan 2010 at 4:17

GoogleCodeExporter commented 8 years ago
Hey nielkie,
I did a little test of oprofile. I was able to get oprofile to run on the N900, 
and 
I got the debug symbols for the kernel and for orp, and I did a little test.
This isn't much because I'm not at home at the moment, so I couldn't connect, 
so the 
only thing this opreport result shows is for open the UI, clicking on launch, 
and 
waiting for about 10 seconds while it tries to connect, then I stopped it.. it 
was 
mainly done to test whether oprofile was working correctly or not, and also 
because 
the whole 'fade in' image when orp tries to connect seemed slow too, so it 
would be 
nice to profile that too.
Attached is the file, have a look if you're curious!
I checked your iphone port issue, it's nice to some optimization done, and I'd 
be 
interested in seeing a refactoring or a fork that would use better 
technologies. I 
tried to check your patch but it was too big and I don't have much time, so 
maybe 
you could quickly explain it to me, saying how it improved the performance and 
tell 
me if it would be safe to use that or if it contains iphone specific stuff (I 
saw 
#ifdefs though).

Original comment by snifikino on 6 Jan 2010 at 9:48

Attachments:

GoogleCodeExporter commented 8 years ago
Hi again, here's my oprofile results for a simple remote play session.. I 
actually 
tries it a few times since oprofile is a statistical tool, so the more samples 
we 
get, the better results we have...

The first one, I opened the ui, launched the game, waited quite a while, and 
used 
left/right to change the view, and have sound played when i was over a game's 
thumbnail. The second one lasted less time (it got an error about corrupted 
stream 
so it stopped early), and i didn't do much with it, just opened it without 
sound...
the third and fourth reports are also 'idle' ones but with the cursor on the 
game, 
so we receive sound (the game's sound when highlited in the XMB).
The 4th report is important because when I took, I had just installed the 
libc6-dbg 
amd libstdc++ debug packages, so we can see which calls are being made in libc 
and 
libstdc++...

We get about 40% CPU on ORP, 35% CPU on the kernel, 8% in libc calls, then some 
more 
cpu for pulseaudio, the FB driver, and some nokia voice driver.
I also attached the result of 'powertop', which can be interesting.
As you can see most of the CPU is being used on the H264 decoding.. there is 
also 
some CPU needed for faad, but what worries me is all the time needed for 
scheduling 
the threads, as well as inside libc, which is mainly memcpy and memset calls! 
This 
means the code really needs to be optimized in order to reuse buffers instead 
of 
copying data over and over again...

All this copying is also causing a huge amount of CPU to be used for DMA.. look 
at 
the mcspi calls in the kernel!! The mcspi is the driver for DMA (google it), 
which 
means that all those memcpy calls are having a huge impact on performance 
because of 
the memcpy CPU, and the DMA... 

I think that if the code gets optimized to avoid any unnecessary memory 
allocations 
on the critical path, as well as memsets and memcpys, then we should have a 
much 
better performance.. then fix the threads to act more nicely, and finally try 
to get 
that H264 decoding off the CPU and onto the DSP will make ORP run smoothly on 
the 
N900! 

I hope this is helpful, and if you guys need some more profiling done or other 
kind 
of information, let me know!

KaKaRoTo

Original comment by snifikino on 7 Jan 2010 at 6:23

Attachments:

GoogleCodeExporter commented 8 years ago
Wow!  Great stuff posted here...  Particularly the mention of GStreamer.  I've 
heard
of it but have never looked at the API.  It sounds like switching to this will
eliminate most of the bottle necks that are causing grief... AND GStreamer 
clocking
support looks like my a/v sync issues will just go away :)

I'll create a GStreamer branch and see how that goes...

Original comment by darryl...@gmail.com on 11 Jan 2010 at 10:36

GoogleCodeExporter commented 8 years ago
Any progress on this?
I got ORP packaged on the official repos and would really like to see a switch
GStreamer (which would use the DSP)

Original comment by mohammad...@gmail.com on 5 May 2010 at 1:58

GoogleCodeExporter commented 8 years ago
Yes, well, nothing public yet due to lack of stablility.  I have an experimental
branch working with GStreamer, but it's far from feature complete.  When I have 
more
time I'll come back to, and release it.

Original comment by darryl...@gmail.com on 5 May 2010 at 2:13

GoogleCodeExporter commented 8 years ago
If you need help or something, let me know, I might try to use some of my free 
time to 
have a look at your code and help you out with it. Or better yet, if you have 
specific 
issues or GST_DEBUG logs, I can take a look and try to figure out what's wrong.
I'm a GStreamer expert working for Collabora (the company behind GStreamer), so 
feel 
free to ask :)

Original comment by snifikino on 5 May 2010 at 3:06

GoogleCodeExporter commented 8 years ago
Awesome!  That's great to hear.  I can already think of a few questions but I 
won't
bother you with them now.  I should have time to continue work on the GST 
branch in a
few weeks.  I must say I'm *very* impressed with the GST API, it's been a joy to
learn - it just works!  I've been able to decode two codecs so far, the audio 
(AAC)
and video (h264) used when in the XMB.  Not sure if I'll have problems with 
ATRAC3
which is used by some games.

Anyway, thanks for the feedback and I promise not to bother ya (too much)!

Original comment by darryl...@gmail.com on 5 May 2010 at 10:09