antirez / load81

SDL based Lua programming environment for kids similar to Codea
BSD 2-Clause "Simplified" License
647 stars 64 forks source link

Replace drawing primitive implementations with SDL_gfx #26

Closed rlane closed 12 years ago

rlane commented 12 years ago

The SDL_gfx functions have a very similar interface and are much faster. Before these changes the asteroids example used 80% CPU and couldn't consistently hit 30fps. Now it uses 2% CPU and runs at a smooth 30fps.

seclorum commented 12 years ago

I agree with this patch, it definitely pushes things forward ..

antirez commented 12 years ago

Hello! Thanks for the patch.

Unfortunately I can't accept it because this drastically reduces the portability of the project in other devices, Android phones, iOS, OSx native GUI, and so forth. So far we only need a surface where to draw pixels, so porting is trivial.

However I'm interested in understanding how we could improve performances with the current implementation, so today I added a FPS counter that can be enabled with --fps. I'll profile the code to check how drawing could be improved. Thanks! Salvatore.

seclorum commented 12 years ago

Hmmm .. a pity. Maybe SPriG is better suited, then?

http://code.bluedinosaurs.com/lib/sprig/

antirez commented 12 years ago

I tried to profile stuff a bit, and apparently the most time consuming function in most programs on OSX is SDL_UpdateRect(), even stranger is that if I attach the profiler it starts to go faster...

Btw I just pushed a branch called 'hline' that implements probably faster drawing primitives, please could you check if this provides an advantage on a slow/Linux-based system? Thank you. I'm having issues verifying that on OSX because of strange SDL time behaviors.

On a side note: maybe it makes sense anyway to limit frame rate to 30fps?

Cheers, Salvatore

seclorum commented 12 years ago

Okay, I tested the hline branch on my Pandora, and for the text demo it definitely sped things up immensely - the text really jumps to catch the cursor now, whereas the lag previously was really, really bad.

However, for the asteroids and helicopter games, performance is still miserable. I will now try to test the SDL_gfx patches above, just to get another idea of how the performance is effected (no problems with installing SDL_gfx on Open Pandora, obviously..)

Frame-limiting to 30fps? Probably a good idea, actually ..

antirez commented 12 years ago

seclorum: thank you, please test the latest "master" branch as well, contains improvements on the speed side.

antirez commented 12 years ago

In master branch I added the limit to 30 FPS, and adjusted the asteroids constants to run smooth with 30 FPS. This should improve a lot the performances on the Pandora.

seclorum commented 12 years ago

I tested the new branch on Open Pandora, and the text.lua/lines.lua/triangles.lua scripts work pretty well .. but still helicopter/flames/asteroids performance - while improved over the last versions - is still very, very slow.

What I think is the problem is floating point performance. I will try to optimize the ARM build for Pandora and see if it makes a difference. Maybe its about mixing float and int parameters in function calls on ARM will result in dismal performance ..

antirez commented 12 years ago

@seclorum thank you very much, you tested latest 'master' right? Do you want a test branch that only uses integer math? I've one and I can share it on github.

seclorum commented 12 years ago

Yes, that would be great - I'll test it right away if you have it.

Right now I'm trying a few combinations of CFLAGS optimized for the ARM target - so far, not really much to report, things are still slow. I believe the problem really is floating point performance in general, and particularly the mixture of it within the calls to Lua functions. I'll try the integer branch right away and maybe that will help.

rlane commented 12 years ago

The SDL_UpdateRect performance issues are due to SDL using a "shadow surface" because the depth of the surface doesnt match that of the real video memory. The last commit in this pull request fixes that.

SDL_gfx seems to be very portable, since it has C fallback code for non-x86 systems. Or are you saying you don't plan to rely on SDL as the abstraction layer?

seclorum commented 12 years ago

Ah, good point rlane! In fact, that simple change makes things perform much better on the Pandora. antirez, I would really encourage you to re-consider this patch, as it definitely seems to be a step in the right direction - and SDL_Gfx is portable, after all.

rlane commented 12 years ago

I just tried using f2c69f78 (the color depth patch) without SDL_gfx and ran into problems, since the load81 graphics primitives assume 24bpp. SDL_gfx has code to handle various color depths so it works fine.

seclorum commented 12 years ago

Okay, at least on Pandora we can get a big speed boost if we use SDL_HWSURFACE instead of SDL_SWSURFACE. So I think its time to start adding platform-specific code to address these sorts of issues.

seclorum commented 12 years ago

I want to add that even if we specify no depth in SDL_SetVideoMode() as in the patch above (0 instead of 24), we still have problems with the existing codebase assuming 24bit depth. So, I second the call for use of SDL_Gfx, which can be used as a platform solution across the board, imho.

rlane commented 12 years ago

I've rebased my sdl-gfx branch to apply against latest master.

ghost commented 12 years ago

Just as a side note, I was working on a branch that was almost the same as hline and the improvement was about 5.5%.

seclorum commented 12 years ago

I think we really need to handle the float issue - this is going to be a problem for ARM targets in general.

seclorum commented 12 years ago

And, also, SDL_gfx as a solution to the color depth issue would be a wise path to take.

agladysh commented 12 years ago

Right now I'm trying a few combinations of CFLAGS optimized for the ARM target - so far, not really much to report, things are still slow. I believe the problem really is floating point performance in general, and particularly the mixture of it within the calls to Lua functions.

LuaJIT has support for ARM. Have you tried replacing stock Lua with LuaJIT?

ghost commented 12 years ago

He is probably talking about the floating point calculations inside setPixel

void setPixelRaw(unsigned char _p, int r, int g, int b, float alpha) {

if SDL_BYTEORDER == SDL_LIL_ENDIAN

p[0] = (alpha_b)+((1-alpha)_p[0]);
p[1] = (alpha_g)+((1-alpha)_p[1]);
p[2] = (alpha_r)+((1-alpha)_p[2]);

else

p[0] = (alpha_r)+((1-alpha)_p[0]);
p[1] = (alpha_g)+((1-alpha)_p[1]);
p[2] = (alpha_b)+((1-alpha)*p[2]);

endif

}

antirez commented 12 years ago

Dudes, I understand your point, but we have to find a solution without SDL_gfx, the reason is that currently we not really depend on SDL itself, because all you need to port the code to non-SDL environments is to create a framebuffer and a function to write a pixel. This is a big feature from the point of view of portability.

What we can do is:

With this changes we can gain speed without binding the project to SDL forever.

Salvatore

antirez commented 12 years ago

Hello, new master uses int math for alpha blending. I hope this improves performances... fingers crossed! Please report new performances if you can, thank you.

Salvatore

rlane commented 12 years ago

What platform are you interested in that doesn't have SDL? It's not just setPixel you'd need to port - think about input and (eventually) audio.

seclorum commented 12 years ago

I concur - pretty much every platform of any sort of consequence supports SDL, and .. well .. I just don't see the problem with sticking to SDL for the purposes of maintaining portable code.

Lets not kid ourselves - SDL provides the most platforms for the buck right now, and writing depth-neutral code seems completely out of the scope of LOAD81's purpose - wouldn't you prefer to be working on features rather than low-level stuff like color conversion? SDL+SDL_gfx gives the best way forward, and if used properly - when the time comes to get rid of SDL, if that ever really comes - won't be a problem at all.

antirez commented 12 years ago

Ok, let's go for gfx, after all the original implementation is in the commit history and it will be possible to revert at a later time if needed. Thanks for arguing about this.

seclorum commented 12 years ago

Great. I think you'll be happy with the results.

Just to let you know: the int-math version is much faster, for sure, but still quite laggy - to be clear, the non-Int version was much slower, this version is faster-but-still-laggy, and the SDL_gfx version, with no 24-bit back-buffer problems, is faster still.

So the best solution: switch to SDL_gfx, and switch to Integer math. This performs very well on the Pandora as a test platform.

To reward you for the SDL_gfx switch, I will make a demo video as soon as we have everything merged that shows you the performance on the Open Pandora! :)

antirez commented 12 years ago

Thanks seclorum ;) Now it's merged.

Cheers, Salvatore

seclorum commented 12 years ago

One other thing : SDL_gfx provides a lot of useful primitives, also, such as Polygons, rotoscoping, and so on. If these are made available through the Lua interface, it opens the door for a lot of fun stuf - blits, animation, and more. So I think it gives a little reward for the decision.

antirez commented 12 years ago

I've the impression that if we give too much "advanced" primitives it will be much simpler to create more interesting things but also a lot less instructive. For instance the Asteroids demo that I'm trying to explain piece by piece to my son shows him that you need to rotate stuff with trigonometry, at some point along the line before drawing on the screen.

For the same reason I like your proposal about low level sound with ADSR envelope modulation, frequency and waveform, it's cool enough to experiment, but low level enough so that Load81 can really somewhat replace a missing C64 experience.

seclorum commented 12 years ago

Okay, I fully understand your thinking about this, but surely the feature doesn't have to be used if you don't want to - in the case of your education of your son (and mine too!), we don't have to tell them there are advanced API's for doing rotozooming and so on, until they have understood how to do it the proper way?

I'm not trying to argue with you over features a lot (feels that way, maybe) but for example in the Open Pandora community, where I believe Load81 will get a lot of love and use, these sorts of features will be very welcome. There is the case that a lot of Open Pandora users want to learn how to code, and Load81 is perfect for this, but their purpose for learning how to code is to actually create interesting and useful things, more so than to learn the math and low-level reasoning behind it. So if we have the policy of restricting the API, it makes it slightly less useful for those users.

However, I understand your position, really, and I respect the fact that this is your project with your own local goals, so I won't push the point. At this point, I don't have commit access - so this means if I have features to implement, I have to sell you on them before you will do the merge - and of course, I can maintain my own fork of more advanced things too.

seclorum commented 12 years ago

WOW! I just want to report in that this latest version is MUCH faster - in fact, its bloody great! :) Video in progress!

antirez commented 12 years ago

Thank you seclorum! I'll evaluate new additions for sure one after the other, but we'll get for sure a "cloud" to send/receive messages soon (I hope before the end of this weekend), and support for audio.

Salvatore

seclorum commented 12 years ago

I had that problem too cskyzi, but of course its just a bug in the Makefile .. build the lua lib first, then return to load81 root and type 'make' .. it works. I'm sure it'll be a quick fix.

Guys, here's a little demo video for you so you can see the difference in performance now:

http://primitur.at/load81_perf_demo.m4v

antirez commented 12 years ago

wow the OpenPandora is seriously cool! :)

seclorum commented 12 years ago

Yes, it really is a great little computer! I encourage everyone to get one .. if only they could!

Anyway, onwards and upwards - I'll update LOAD81.pnd for the Open Pandora users now, the performance fix is really very special and I'm sure we'll see some more interesting demo's from pmprog (and perhaps others) on the OPB forums ..

antirez commented 12 years ago

@seclorum thanks a lot! Currently I'm setting up a special Redis server on the internet that only allows PUBLISH and SUBSCRIBE to implement the "cloud" messaging feature of Load81. This will be funny to create simple multiplayer games, chats, and so forth.

agladysh commented 12 years ago

@antirez Wouldn't it be too trivial to abuse?

antirez commented 12 years ago

@agladysh do you mean redis-side or lua81 side?

From the point of view of Redis, with only publish and subscribe enabled, and now that we have output buffer limits, it should be pretty hard to abuse. In the Load81 side you need to subscribe to channels like this:

listen("asteroids)

and then you receive messages with:

function receive(sender,msg)
    ....
end

that gets called for every message received and carries a Lua object in "msg", and a SHA1-sized hex string identifier of the sending client in "sender". If you use it to update other players ships on the screen, it's ok, if you write it into /etc/passwd it's another matter! ;)

agladysh commented 12 years ago

@antirez I mean redis-side. How would you prevent using this thing as a CNC for a botnet, for example?

rlane commented 12 years ago

@antirez How much of the redis API will you expose? I think it's a great idea but for the NaCl port I will need to tunnel the Redis commands over HTTP since Native Client apps aren't allowed to open arbitrary TCP connections.

ps. Maybe we should move this discussion to a feature bug?

seclorum commented 12 years ago

This sounds fantastic! I really look forward to this feature. One thing: can we have a forum to discuss these things, so that you're not abusing your Issues area with things like this? I imagine it would be really nice to continue such discussions with you guys in a forum-like atmosphere.

agladysh commented 12 years ago

...for the NaCl port I will need to tunnel the Redis commands over HTTP since Native Client apps aren't allowed to open arbitrary TCP connections.

One more reason not to expose raw Redis

agladysh commented 12 years ago

One thing: can we have a forum to discuss these things, so that you're not abusing your Issues area with things like this

Time for a googlegroup?

seclorum commented 12 years ago

.. API keys?

seclorum commented 12 years ago

We could also have an IRC channel somewhere? #load81 on irc.freenode.net?

antirez commented 12 years ago

The proper way to discuss this is simply to open an issue about messaging, I'm doing it right now :)

antirez commented 12 years ago

I opened an issue with all the details, feedbacks very welcomed!