Open fallaha56 opened 8 years ago
I think that a software plugin that is optimized would run quite nice. I hear that AIO (legend of Dragoon) has done some amazing speed optimization using SSE4, and AL. I think that AVX might be the teeter totter tipper that AL's software RDP needs.
not going to disagree with that!
but surely if MAME can multi-thread its 3Dfx games then surely someone can do the same for the RDP?
I have never played with the 3dfx stuff in mame.. What do you recommend? I would like to test that out. Maybe AIO could take advantage of this type of enhancement
any of the newer Atari games -San Fran Rush, Gauntlet Leg
of note option is off by default in MAME options...
-i think it was Ville Linde who did the coding -he called it software SLI http://vlinde.mameworld.info/
take that back, it was Haze! http://mamedev.emulab.it/haze/
Looking on youtube, most users that are running these games are on 4ghz computers.. I think that AIO's work on single CPU is far better..
add me to skype if you have it... theboy_181@hotmail.com
are you running GL on mame full speed ? if so what are your specs ?
i'm on both -4Ghz 6-core 980x and MAME uses all 6 cores heavily...
but my Surface4Pro has much less gigahurtz...
This is something I thought about before. Can angrylion's plugin be used for software framebuffer emulation in a hardware renderer? I think it should be possible. Depth and auxiliary buffers can be rendered in software and main color buffers are rendered in hardware. Jabo did that in his plugin with his software renderer.
The alternative is to render everything on the GPU and copy over the necessary data. That's how GLideN64 currently works. Both methods have their advantages and disadvantages. A full hardware renderer is possibly faster.
hi @purplemarshmallow merry xmas, great minds think alike ;)
if i understand things right (and i might not!): -the angrylion RDP is pretty much 100% accurate but CPU heavy -GLideN64 still needs hacks+++ -once the image reaches the VI filters, it's one way traffic? -prior to this there's lots of 2-way traffic between CPU and the RDP?
just to keep things simple and not worry about 2-way traffic why not just plug the angrylion RDP code into the GLideN64 VI? this would be both accurate and pretty fast no?
this would be both accurate and pretty fast no?
It would not be fast.
hi @LegendOfDragoon, @theboy181 tells me you're a man of much knowledge of such things ;)
am guessing you may know my next question already...
-i get depressingly close to 60fps on both my Surface and desktop rig in full software BUT only with VI filters off (the same for most people i presume?)
-that's using the old plugins from the clearly talented (but the equally as difficult to talk to) Hatcat
-so...if the RDP is optimised/SSE4/AVX/multithreaded and the VI offloaded to OpenGL would we not start hitting the magical 60fps?
(plus integrate it and it could help develop the full hardware side of GLideN64?)
It depends on the game you're playing, as well as the resolution (for games that support hi-res). It could get a lot faster, but at the same time, some games are just too demanding. If you're interested in specific games, I could give you a better estimate.
sure, i appreciate it won't work well for every game but i'm still drawn to the idea of having a built-in reference renderer/a single GFX plugin for PJ64
which optimisations do you think hold back PJ64 the most/which would benefit it the most?
(in case it's of interest i've tested Mario64, F-Zero, Zelda:OOT, MarioTennis and MK64 but with the exception of MTennis they're all that tantalising 55-60fps...)
SM64, OOT, MK64 should run full speed on average hardware, after optimizing. MarioTennis will take some work (certain parts of the game are more demanding) and F-ZERO will take a lot of work to get running on mid-low end machines (if even possible). I imagine high end machines would run F-ZERO and Mario Tennis fine though.
I'd say the order for most demanding (from highest to lowest) for most games is RDP, RSP, VI, and then CPU (assuming you're using a cpu recompiler). I think using 64-bit could make a significant difference.
interesting
64-bit seems to be close, recompiler the outstanding issue?
also, if the RDP is a big bottleneck then won't multi-threading help? MAME finally made the R5000/3Dfx games playable this way (and never thought i'd be complimenting MAME on speed!)
also need to take back what i said about F-Zero must have been on the Kool Aid at the time (or running GLideN64), it's slow as molasses in software lol
recompiler the outstanding issue?
Recompiler makes a huge difference in speed.
also, if the RDP is a big bottleneck then won't multi-threading help? MAME finally made the R5000/3Dfx games playable this way (and never thought i'd be complimenting MAME on speed!)
Short answer is, yes it can help. But I think that should literally be the last thing done, if at all. Focus should go towards optimizing the code as much as possible, before using multi-threading. There's just too many potential optimizations. It's just much more efficient right now to optimize the slow code, than it is to spend time working on multi threading.
thanks, appreciate the insights, 3am here tho so off to bed! happy/keen to be involved in whatever way i can. merry xmas
My two kopecks: if you want to combine soft and hw renders, they both must support the same mode, HLE or LLE. Angrylion's plugin works only in LLE. GLideN64 is not very good in LLE.
Merry Christmas!
Merry Xmas @gonetz I hope SANTA is good to you this year
Thanks, but I'm from Russia. Santa has no power there. Also, Russian Orthodox Church uses Julian calendar, so Christmas in Russia will be officially celebrated January 7. Since Russia supports all main religions, many russians celebrate xmas twice, today and two weeks after. The same is with New Year holidays. "Old New Year" is unofficially celebrated December 31 night by Julian calendar, that is January 13.
Merry No.1 Christmas @gonetz, my family is Orthodox as well, two for the price of one ;)
If I'm frank, one of the biggest reasons I suggested this was that I believe fragmented work is hurting N64 emulation badly, this could help solve that problem and possibly even serve as a catalyst for another round of donations if you wanted?
Plus, to play devils advocate, does it matter if the soft RDP is LLE only if the RSP recompiler covers both?
A single unified GFX plugin that works with virtually all games? That would b something special...
LLE and HLE display lists are absolutely different. RSP plugin returns only one of them. I think it is possible to run two threads, HLE one with hardware plugin and LLE with soft render, then sync them. However we need special core or RSP plugin for that.
Zilmar just holds too much special powers over this. ;).
I think it would be better to give the gfx plugin control over HLE/LLE would that be possible? First the HLE display list is sent to the gfx plugin. Then the plugin can decide if it wants to HLE the dlist or send it back to the RSP. If it is sent back to the RSP then the plugin receives low level dlists.
This would make N64 emulation much easier. If a microcode is not implemented then LLE could be used automatically.
Plus, to play devils advocate, does it matter if the soft RDP is LLE only if the RSP recompiler covers both?
LLE is too slow.
I think it would be better to give the gfx plugin control over HLE/LLE would that be possible? First the HLE display list is sent to the gfx plugin. Then the plugin can decide if it wants to HLE the dlist or send it back to the RSP. If it is sent back to the RSP then the plugin receives low level dlists. This would make N64 emulation much easier. If a microcode is not implemented then LLE could be used automatically.
I'm not sure if that's necessary, given the fact that Project64 already has per game settings for selecting HLE or LLE.
I'm not sure if that's necessary, given the fact that Project64 already has per game settings for selecting HLE or LLE.
But the core does not know what a plugin supports. If a plugin does not support HLE it will not work without manual settings. Like angrylion's plugin. Also if someone wants to reverse an unknown ucode it would make things easier.
Automatic detection is usually better than manual per game settings. It should be more flexible. I think it would be useful. Should be possible to get rid of all unknown ucode errors.
I made an experiment. I built angrylion's plugin into z64gl and used it for framebuffer emulation. It's a bit faster than angrylion's plugin but not so much. It can be more optimized because a lot of work is done twice. One problem is the plugins can't share the same TMEM because z64gl's TMEM emulation is severely incorrect.
Here's a build if someone is interested. https://drive.google.com/file/d/0B7Y6r4SpC_QQdzVuWURyMmkxX2M/view?usp=sharing
The big disadvantage is how do you know if you should render a buffer in software or not? I think it's impossible. With this build everything is rendered in software. This makes things slow.
Weird. For me, it crashes on pj64. On 1964 it works, but the graphics are completely messed up (but works if i use mesa).
But the core does not know what a plugin supports.
Good point.
@purplemarshmallow cool effort :) sadly completely crashing for me too in PJ64/Win10/nVidia :(
@gonetz shame the core and plugins aren't a little more integrated to make all this easier, at least the HLE/LLE settings
@LegendOfDragoon i know what you're saying about LLE and speed but it just seems to be necessary for accurate emulation...plus the existing software RSP and RDP seem to have a lot of scope for optimisation
i know what you're saying about LLE and speed but it just seems to be necessary for accurate emulation...plus the existing software RSP and RDP seem to have a lot of scope for optimisation
It's easier said than done, to actually speed it up to satisfactory levels. Especially since performance isn't the only problem.
z64gl hardly causes any slowdown it's very well optimized. I think I found why it is so fast. Looks like it has some sort of delayed rendering. It collects data and then renders everything at once which is faster but I'm not 100% sure. Almost all slowdown comes from the RSP.
I'm not sure why it crashes. It works for me. PJ64/Win10/nVidia here as well. Mario Tennis looks good with software fb. Body Harvest collisions also work. And OOT subscreen delay is gone. But it's slow.
I managed to modify angrylion's plugin to render only depth values. This is a lot faster z64gl still runs fullspeed in some games on my system. And I get pixel accurate depth! Color rendering is much slower. It's possibly faster to copy data from the GPU.
GLideN64 can use angrylion's plugin as well for depth rendering but it will only work in LLE.
hi @LegendOfDragoon as you know, i'm no great programmer so please take anything i say with a slight pinch of salt!
nonetheless, MAME literally speeds up 400-500% when multi-threading the 3Dfx; Gauntlet Legends goes from an unplayable 20fps to 100% playable
no need to re-invent the wheel, can/can't this approach be duplicated? imitation=sincerest form flattery ;)
same for the software VI filter that costs 10+fps yet is totally do-able (if not already done?) in OpenGL?
@purplemarshmallow can you remind me what config files etc i need for z64gl? not sure i installed it last time i re-did my rig
you put glew32.dll and SDL.dll in the emulator's directory. It's important that you have the right version I included them. and you put z64gl.dll and z64gl.conf in the plugin directory. Here's my config file: https://drive.google.com/file/d/0B7Y6r4SpC_QQV3FlZDdONDYxa2s/view?usp=sharing
I have my doubts about z64gl being well optimized, unless the performancr issues I encountered are caused by bugs. Some games are fast, like sm64, but some games run too slow, even when compared to angrylions, such as vigilante 8
@fallaha56 3Dfx is different than N64. Last I checked, MESS for N64 was not full speed. I doubt it would be feasible to split the rdp into multiple threads. As for using opengl for vi filters, I'm not so sure it will be that much faster. Besides, I dont think you should worry about optimizing filters until the RDP runs beyond satisfactory speed.
I have my doubts about z64gl being well optimized, unless the performancr issues I encountered are caused by bugs.
I think it's caused by bugs. It's based on a very old version of the MESS renderer. Many things work incorrect. If a game works correctly then it's very fast. Like OOT or World Driver Championship.
I think it's caused by bugs. It's based on a very old version of the MESS renderer. Many things work incorrect. If a game works correctly then it's very fast. Like OOT or World Driver Championship.
Odd, I get full speed in OOT, but not WDC with z64gl. Anyway, does your experimental build implement coverage?
Odd, I get full speed in OOT, but not WDC with z64gl
Can you run GLideN64 LLE? How fast is it compared to z64gl?
Anyway, does your experimental build implement coverage?
Framebuffers rendered by angrylion's plugin are rendered with correct pixel coverage emulation. Framebuffers rendered by z64gl not.
@gonetz should I work on a PR and replace the (not properly working) LLE depth buffer to Rdram code with software depth buffer rendering? That's easily possible with angrylion's plugin.
Can you run GLideN64 LLE? How fast is it compared to z64gl?
Unfortunately, I cannot run GlideN64 properly. I only have OpenGL 2.1. What I usually do is have other people to test for me, when I'm curious about GLideN64 :smile: . For instance, if I remember correctly, Mortal Kombat 4 ran really slow with GLideN64 LLE for multiple people, but z64gl should be full speed.
@gonetz should work on a PR and replace the (not properly working) LLE depth buffer to Rdram code with software depth buffer rendering. That's easily possible with angrylion's plugin.
I think that's a great idea. Angrylion's code is really nice.
So hybrid depth emulation? Nice solution.
Although, isn't there the pressing problem of the LLE triangle function being incorrect?
The LLE triangle function is not perfect. But I don't think that's the biggest problem. The biggest problem is speed. z64gl mostly runs fullspeed on my system in Zelda OOT with software depth buffer. In some places it slows down. GLideN64 LLE is slower.
Optimize RSP with proper dynarec?
That would be good. PJ64's RSP recompiler is about as fast as cxd4's interpreter. Also it's not compatible with some games. cxd4 did a good job with optimizing his interpreter. But a recompiler can be much faster
PJ64's RSP recompiler is about as fast as cxd4's interpreter.
When you're using a fast LLE graphics plugin, you can usually see a clear difference in speed between PJ64's RSP Recompiler and cxd4's RSP.
Also it's not compatible with some games.
Aside from 3 factor 5 games, what others are not compatible?
When you're using a fast LLE graphics plugin, you can usually see a clear difference in speed between PJ64's RSP Recompiler and cxd4's RSP.
Yes the RSP recompiler is faster
Aside from 3 factor 5 games, what others are not compatible?
I don't know any other games
I implemented software depth buffer rendering in GLideN64. My implementation works like this: First dlists are sent to angrylion's plugin. Then they are executed but many commands are ignored like RDP_TriTxtr, RDP_TriFill, ... After a command is executed it is sent to GLideN64 and executed again. Possibly this implementation can be improved.
The problem is that angrylion's plugin still draws something in the color buffer. It's mostly white with some gray areas. This conflicts with GLideN64's validity checking implementation and buffers are removed from the GPU.
Maybe angrylion's plugin can be optimized for rendering depth values only.
I experimented with copy from RDRAM and software depth buffer rendering
This is the depth image. It's used as background in the subscreen. Usually the game renders the subscreen background in the depth buffer area but color rendering is disabled.
angrylion's plugin draws this in the color buffer area if commands for color rendering are not executed
And this is the pixel coverage? Why does this show up if I open the subscreen? Looks like the line mode in Goldeneye #226
hey guys merry xmas (almost) to you all
just wondering...with various issues in the renderer being worked and given the high+ requirements to run a full software RDP is there any scope to combine Angrylion's RDP into the plugin?
perhaps with GLideN64 doing the VI filter work in OpenGL?
would this make for a single perfect emulation N64 plugin?