Closed rlane closed 12 years ago
I agree with this patch, it definitely pushes things forward ..
Hello! Thanks for the patch.
Unfortunately I can't accept it because this drastically reduces the portability of the project in other devices, Android phones, iOS, OSx native GUI, and so forth. So far we only need a surface where to draw pixels, so porting is trivial.
However I'm interested in understanding how we could improve performances with the current implementation, so today I added a FPS counter that can be enabled with --fps. I'll profile the code to check how drawing could be improved. Thanks! Salvatore.
Hmmm .. a pity. Maybe SPriG is better suited, then?
I tried to profile stuff a bit, and apparently the most time consuming function in most programs on OSX is SDL_UpdateRect(), even stranger is that if I attach the profiler it starts to go faster...
Btw I just pushed a branch called 'hline' that implements probably faster drawing primitives, please could you check if this provides an advantage on a slow/Linux-based system? Thank you. I'm having issues verifying that on OSX because of strange SDL time behaviors.
On a side note: maybe it makes sense anyway to limit frame rate to 30fps?
Cheers, Salvatore
Okay, I tested the hline branch on my Pandora, and for the text demo it definitely sped things up immensely - the text really jumps to catch the cursor now, whereas the lag previously was really, really bad.
However, for the asteroids and helicopter games, performance is still miserable. I will now try to test the SDL_gfx patches above, just to get another idea of how the performance is effected (no problems with installing SDL_gfx on Open Pandora, obviously..)
Frame-limiting to 30fps? Probably a good idea, actually ..
seclorum: thank you, please test the latest "master" branch as well, contains improvements on the speed side.
In master branch I added the limit to 30 FPS, and adjusted the asteroids constants to run smooth with 30 FPS. This should improve a lot the performances on the Pandora.
I tested the new branch on Open Pandora, and the text.lua/lines.lua/triangles.lua scripts work pretty well .. but still helicopter/flames/asteroids performance - while improved over the last versions - is still very, very slow.
What I think is the problem is floating point performance. I will try to optimize the ARM build for Pandora and see if it makes a difference. Maybe its about mixing float and int parameters in function calls on ARM will result in dismal performance ..
@seclorum thank you very much, you tested latest 'master' right? Do you want a test branch that only uses integer math? I've one and I can share it on github.
Yes, that would be great - I'll test it right away if you have it.
Right now I'm trying a few combinations of CFLAGS optimized for the ARM target - so far, not really much to report, things are still slow. I believe the problem really is floating point performance in general, and particularly the mixture of it within the calls to Lua functions. I'll try the integer branch right away and maybe that will help.
The SDL_UpdateRect performance issues are due to SDL using a "shadow surface" because the depth of the surface doesnt match that of the real video memory. The last commit in this pull request fixes that.
SDL_gfx seems to be very portable, since it has C fallback code for non-x86 systems. Or are you saying you don't plan to rely on SDL as the abstraction layer?
Ah, good point rlane! In fact, that simple change makes things perform much better on the Pandora. antirez, I would really encourage you to re-consider this patch, as it definitely seems to be a step in the right direction - and SDL_Gfx is portable, after all.
I just tried using f2c69f78 (the color depth patch) without SDL_gfx and ran into problems, since the load81 graphics primitives assume 24bpp. SDL_gfx has code to handle various color depths so it works fine.
Okay, at least on Pandora we can get a big speed boost if we use SDL_HWSURFACE instead of SDL_SWSURFACE. So I think its time to start adding platform-specific code to address these sorts of issues.
I want to add that even if we specify no depth in SDL_SetVideoMode() as in the patch above (0 instead of 24), we still have problems with the existing codebase assuming 24bit depth. So, I second the call for use of SDL_Gfx, which can be used as a platform solution across the board, imho.
I've rebased my sdl-gfx branch to apply against latest master.
Just as a side note, I was working on a branch that was almost the same as hline and the improvement was about 5.5%.
I think we really need to handle the float issue - this is going to be a problem for ARM targets in general.
And, also, SDL_gfx as a solution to the color depth issue would be a wise path to take.
Right now I'm trying a few combinations of CFLAGS optimized for the ARM target - so far, not really much to report, things are still slow. I believe the problem really is floating point performance in general, and particularly the mixture of it within the calls to Lua functions.
LuaJIT has support for ARM. Have you tried replacing stock Lua with LuaJIT?
He is probably talking about the floating point calculations inside setPixel
void setPixelRaw(unsigned char _p, int r, int g, int b, float alpha) {
p[0] = (alpha_b)+((1-alpha)_p[0]);
p[1] = (alpha_g)+((1-alpha)_p[1]);
p[2] = (alpha_r)+((1-alpha)_p[2]);
p[0] = (alpha_r)+((1-alpha)_p[0]);
p[1] = (alpha_g)+((1-alpha)_p[1]);
p[2] = (alpha_b)+((1-alpha)*p[2]);
}
Dudes, I understand your point, but we have to find a solution without SDL_gfx, the reason is that currently we not really depend on SDL itself, because all you need to port the code to non-SDL environments is to create a framebuffer and a function to write a pixel. This is a big feature from the point of view of portability.
What we can do is:
With this changes we can gain speed without binding the project to SDL forever.
Salvatore
Hello, new master uses int math for alpha blending. I hope this improves performances... fingers crossed! Please report new performances if you can, thank you.
Salvatore
What platform are you interested in that doesn't have SDL? It's not just setPixel you'd need to port - think about input and (eventually) audio.
I concur - pretty much every platform of any sort of consequence supports SDL, and .. well .. I just don't see the problem with sticking to SDL for the purposes of maintaining portable code.
Lets not kid ourselves - SDL provides the most platforms for the buck right now, and writing depth-neutral code seems completely out of the scope of LOAD81's purpose - wouldn't you prefer to be working on features rather than low-level stuff like color conversion? SDL+SDL_gfx gives the best way forward, and if used properly - when the time comes to get rid of SDL, if that ever really comes - won't be a problem at all.
Ok, let's go for gfx, after all the original implementation is in the commit history and it will be possible to revert at a later time if needed. Thanks for arguing about this.
Great. I think you'll be happy with the results.
Just to let you know: the int-math version is much faster, for sure, but still quite laggy - to be clear, the non-Int version was much slower, this version is faster-but-still-laggy, and the SDL_gfx version, with no 24-bit back-buffer problems, is faster still.
So the best solution: switch to SDL_gfx, and switch to Integer math. This performs very well on the Pandora as a test platform.
To reward you for the SDL_gfx switch, I will make a demo video as soon as we have everything merged that shows you the performance on the Open Pandora! :)
Thanks seclorum ;) Now it's merged.
Cheers, Salvatore
One other thing : SDL_gfx provides a lot of useful primitives, also, such as Polygons, rotoscoping, and so on. If these are made available through the Lua interface, it opens the door for a lot of fun stuf - blits, animation, and more. So I think it gives a little reward for the decision.
I've the impression that if we give too much "advanced" primitives it will be much simpler to create more interesting things but also a lot less instructive. For instance the Asteroids demo that I'm trying to explain piece by piece to my son shows him that you need to rotate stuff with trigonometry, at some point along the line before drawing on the screen.
For the same reason I like your proposal about low level sound with ADSR envelope modulation, frequency and waveform, it's cool enough to experiment, but low level enough so that Load81 can really somewhat replace a missing C64 experience.
Okay, I fully understand your thinking about this, but surely the feature doesn't have to be used if you don't want to - in the case of your education of your son (and mine too!), we don't have to tell them there are advanced API's for doing rotozooming and so on, until they have understood how to do it the proper way?
I'm not trying to argue with you over features a lot (feels that way, maybe) but for example in the Open Pandora community, where I believe Load81 will get a lot of love and use, these sorts of features will be very welcome. There is the case that a lot of Open Pandora users want to learn how to code, and Load81 is perfect for this, but their purpose for learning how to code is to actually create interesting and useful things, more so than to learn the math and low-level reasoning behind it. So if we have the policy of restricting the API, it makes it slightly less useful for those users.
However, I understand your position, really, and I respect the fact that this is your project with your own local goals, so I won't push the point. At this point, I don't have commit access - so this means if I have features to implement, I have to sell you on them before you will do the merge - and of course, I can maintain my own fork of more advanced things too.
WOW! I just want to report in that this latest version is MUCH faster - in fact, its bloody great! :) Video in progress!
Thank you seclorum! I'll evaluate new additions for sure one after the other, but we'll get for sure a "cloud" to send/receive messages soon (I hope before the end of this weekend), and support for audio.
Salvatore
I had that problem too cskyzi, but of course its just a bug in the Makefile .. build the lua lib first, then return to load81 root and type 'make' .. it works. I'm sure it'll be a quick fix.
Guys, here's a little demo video for you so you can see the difference in performance now:
wow the OpenPandora is seriously cool! :)
Yes, it really is a great little computer! I encourage everyone to get one .. if only they could!
Anyway, onwards and upwards - I'll update LOAD81.pnd for the Open Pandora users now, the performance fix is really very special and I'm sure we'll see some more interesting demo's from pmprog (and perhaps others) on the OPB forums ..
@seclorum thanks a lot! Currently I'm setting up a special Redis server on the internet that only allows PUBLISH and SUBSCRIBE to implement the "cloud" messaging feature of Load81. This will be funny to create simple multiplayer games, chats, and so forth.
@antirez Wouldn't it be too trivial to abuse?
@agladysh do you mean redis-side or lua81 side?
From the point of view of Redis, with only publish and subscribe enabled, and now that we have output buffer limits, it should be pretty hard to abuse. In the Load81 side you need to subscribe to channels like this:
listen("asteroids)
and then you receive messages with:
function receive(sender,msg)
....
end
that gets called for every message received and carries a Lua object in "msg", and a SHA1-sized hex string identifier of the sending client in "sender". If you use it to update other players ships on the screen, it's ok, if you write it into /etc/passwd it's another matter! ;)
@antirez I mean redis-side. How would you prevent using this thing as a CNC for a botnet, for example?
@antirez How much of the redis API will you expose? I think it's a great idea but for the NaCl port I will need to tunnel the Redis commands over HTTP since Native Client apps aren't allowed to open arbitrary TCP connections.
ps. Maybe we should move this discussion to a feature bug?
This sounds fantastic! I really look forward to this feature. One thing: can we have a forum to discuss these things, so that you're not abusing your Issues area with things like this? I imagine it would be really nice to continue such discussions with you guys in a forum-like atmosphere.
...for the NaCl port I will need to tunnel the Redis commands over HTTP since Native Client apps aren't allowed to open arbitrary TCP connections.
One more reason not to expose raw Redis
One thing: can we have a forum to discuss these things, so that you're not abusing your Issues area with things like this
Time for a googlegroup?
.. API keys?
We could also have an IRC channel somewhere? #load81 on irc.freenode.net?
The proper way to discuss this is simply to open an issue about messaging, I'm doing it right now :)
I opened an issue with all the details, feedbacks very welcomed!
The SDL_gfx functions have a very similar interface and are much faster. Before these changes the asteroids example used 80% CPU and couldn't consistently hit 30fps. Now it uses 2% CPU and runs at a smooth 30fps.