gonetz / GLideN64

A new generation, open-source graphics plugin for N64 emulators.
Other
770 stars 178 forks source link

GLideN64 current state / Project goals (Oct. 2015) #748

Closed purplemarshmallow closed 8 years ago

purplemarshmallow commented 8 years ago

This should give an impression of the current state of GLideN64 (how I see it). There are many things that can be done. I must admit it would take years to achieve all of this but maybe some goals can be reached at some point. I think GLideN64 should move in this direction. Feel free to agree/disagree.

HLE:

LLE:

additional tasks:

tony971 commented 8 years ago

Pinging @Project64 for EFB pokes.

phire commented 8 years ago

CPU reads/writes to/from color/depth buffer: dolphin found a good solution to this but it's probably hard to implement. But I'm pretty sure this can't be solved properly without the emulator's help

The gamecube's hardware design works massively to our advantage here, by not having the current framebuffer in main ram. We get very clear copy signals before it moves, and direct framebuffer access go via special addresses.

I do have a theoretical idea (and an experimental dolphin branch) for using the host's page tables to lock and trap accesses to ranges of ram (with a 4KB granularity) The plan would allow us to delay actually copying from the host GPU to main ram (and stalling the GPU) until the emulated cpu actually tries to read from the ram (by which point hopefully the GPU has finished rendering that buffer)

No cost to CPU emulation speed (as long as it's not accessing locked pages), but it is somewhat invasive.

AmbientMalice commented 8 years ago

A think the biggest, most pressing FB issue is the fact there's something wrong with the framebuffer emulation that breaks PAL games and causes crashes for a wide range of games if FB to texture is used. Crashes are bad. (Oh, yea, and my pet "wot" issue is the fact enabling the framebuffer in Toy Story 2 breaks the game's depth calculation causing 2D objects to render through walls. I mean.. wot?)

N64 depth compare: Too slow and has synchronisation issues. Barely usable currently

It's a wildly inconsistent performance hog. I wouldn't call it "barely usable" since it does work fairly well for certain titles, but the sudden performance dips and the fact it breaks a lot of titles means it needs work.

pixel coverage: not emulated

I have a vague idea of what pixel coverage is in a normal graphics context, but I'm still unclear on how an N64 emulator would implement pixel coverage and make use of it.

blender: implement shader based blending

This would potentially fix a remarkable number of bugs, and would be a massive leap forward for N64 emulation. I guess it's just a matter of someone biting the bullet and coding the thing.

LLE doesn't just have bad performance - it crashes like crazy with FB enabled, has broken depth buffer, and fails to render particles and other 2D objects in a lot of games.

The other thing likely covered under "fix bugs/remove hacks" is GLideN64's texture bugs. Stuff like https://github.com/gonetz/GLideN64/issues/471 affects a LOT of games.

There's also super weird stuff like Starcraft 64's backgrounds that don't render unless you load a save state.

purplemarshmallow commented 8 years ago

The gamecube's hardware design works massively to our advantage here, by not having the current framebuffer in main ram. We get very clear copy signals before it moves, and direct framebuffer access go via special addresses.

The N64's design is very bad for a hardware renderer. The CPU can start drawing an image anywhere in RDRAM and pass this to the VI without involving the co-processor. I can only think of hackish ways to emulate this unless you render in native resolution.

I do have a theoretical idea (and an experimental dolphin branch) for using the host's page tables to lock and trap accesses to ranges of ram (with a 4KB granularity) The plan would allow us to delay actually copying from the host GPU to main ram (and stalling the GPU) until the emulated cpu actually tries to read from the ram (by which point hopefully the GPU has finished rendering that buffer) No cost to CPU emulation speed (as long as it's not accessing locked pages), but it is somewhat invasive.

Such a technique is implemented in Mupen64 + Glide64 (different plugin). The plugin sends a list of allocated buffers to the emulator and then it gets a notification if the CPU reads from or writes to these buffers. But it Mupen64 only has partial support it's difficult to trap all accesses.

For GLideN64 I think it's best to support Mupen64's specifications and make as much use of it as possible there's already an experimental branch. The idea is to use this not only as an optimisation but also for improved accuracy.

A think the biggest, most pressing FB issue is the fact there's something wrong with the framebuffer emulation that breaks PAL games and causes crashes for a wide range of games if FB to texture is used.

You mean the copy color to RDRAM option? It's not a wide range of games I think it only affects the pal versions of Donkey Kong 64 and Vigilante8 and Pokemon Stadium 2. Not sure what causes it it must not necessarily be a problem in GLideN64 maybe it happens because of inaccurate timings or bugs in the emulator or there's a bug in game code.

N64 depth compare: Too slow and has synchronisation issues. Barely usable currently

It's a wildly inconsistent performance hog. I wouldn't call it "barely usable" since it does work fairly well for certain titles, but the sudden performance dips and the fact it breaks a lot of titles means it needs work.

The problem is it's supposed to provide accurate data but the copy depth to RDRAM option does not work with it

LLE doesn't just have bad performance - it crashes like crazy with FB enabled, has broken depth buffer, and fails to render particles and other 2D objects in a lot of games.

When does it crash with FB enabled? Depth buffer emulation currently is not accurate enough but I'm not sure why this works better in HLE.

project64 commented 8 years ago

I was curious what games modify the rdram output via the r4300 and what effects are they trying to generate.

Something like zelda that tries to read from the buffer on the start screen could benefit from the detection so you do not need a hack, but I have not looked in to other games. More curious than anything cause I am not sure what you could do well with out putting the res back to native to get the read/writes correct which is just horrible.

I have some idea that I might get to with being able to track memory pages that would allow me to detect writes to certain buffers, so I could detect the read/writes to those addresses, so curious how it would help and if it is even worth to try and see if it works.

AmbientMalice commented 8 years ago

When does [LLE] crash with FB enabled?

I'd have to test to come up with a list, but Turok 3 is a sure-fire "this will crash" LLE test.

You mean the copy color to RDRAM option? It's not a wide range of games I think it only affects the pal versions of Donkey Kong 64 and Vigilante8 and Pokemon Stadium 2.

Also Mario Tennis. I think the problem is caused by games changing resolution while doing a framebuffer operation. That's why Tennis crashes during menu transitions.Merely turning on Perfect Dark's high resolution mode crashes the emulator if FB to texture is on.

AmbientMalice commented 8 years ago

More curious than anything cause I am not sure what you could do well with out putting the res back to native to get the read/writes correct which is just horrible.

I think a lot of people would be perfectly happy to play games like Pokemon Snap and Body Harvest at native resolution via some "super accurate" mode. But on a related note, there is the lingering "Why exactly doesn't Pokemon Snap work?" issue. It works on Dolphin. It's possible that Nintendo's emulator that Dolphin is emulating is doing something fancy, but still...

purplemarshmallow commented 8 years ago

I was curious what games modify the rdram output via the r4300 and what effects are they trying to generate.

@project64 it can be very different Many games use CPU based framebuffer effects (Monitors in Mario Kart). The CPU can draw something on the screen (pills in Dr. Mario). In Jet Force Gemini the CPU draws things like numbers and raindrops but the game also uses CPU based fb effects #378 . In Superman64 it's the other way round the CPU draws the websites and the co-processor renders an image over it #617 . The CPU can blur a framebuffer (pause screen in conker / Mickey's Speedway GLideN64 currently uses a hack. The CPU can remove colors (Conker's BFD it's war cutscene). In Pokemon Snap the CPU draws the red dot but it's a lot more complex than that the whole camera detection works with cpu reads/writes but nobody knows how exactly and why the current copy color option does not work.

a notification from the emulator could allow GLideN64 to copy buffers only partially only the part which is modified by the CPU

Something like zelda that tries to read from the buffer on the start screen could benefit from the detection so you do not need a hack

Zelda is a complete mystery nobody knows why the CPU reads/writes in this game. Nobody knows what causes the subscreen delay and why filling the color buffer with white color (GLideN64's hack) fixes the subscreen delay. Same problem in Paper Mario and Doubutsu No Mori. In Paper Mario this hack can't be used because it breaks the pause menu background.

purplemarshmallow commented 8 years ago

I'd have to test to come up with a list, but Turok 3 is a sure-fire "this will crash" LLE test.

@AmbientMalice can you give more details Turok 3 works for me in LLE

phire commented 8 years ago

Pokemon snap's red dot is the color buffer.

Here is one frame worth of Dolphin's EFB copies for Virtual console pokemon snap

First we have six 8x8px color copies, centered around the center of the frame (aka, the red dot) efb_frame_698 efb_frame_699 efb_frame_700 efb_frame_701 efb_frame_702 efb_frame_703 efb_frame_704

8x8px is close to the minimum efb copy (8x4 for that format) so it's entirely possible that Nintendo's emulator is copying more pixels than it needs to. It looks like pokemon snap is rendering each pokemon and then checking one or more pixels in the center of the frame to see if they have changed.

This is not the same method it uses for judging photos at the end of the level, where it uses a small 32x24px depth buffer (and disabled color rendering).

Then we have a 640x480px depth copy (Because the Virtual console generally renders n64 games at 640x480). The red dot still works with these disabled, I assume it is just updating the depth buffer in RDRAM. efb_frame_705_b

And a 320x240px copy of the color buffer. The gamecube can downscale by half in the copy pipeline, which is perfect for copying into RDRAM. efb_frame_706

Then a full-sized 640x480px color copy once the red dot is actually rendered. I think this is the final image which gets rendered to the screen. efb_frame_707

Another 640x480px depth copy. You will notice the depth quality has degraded, as Nintendo's emulator has taken the above copy, downsized it to 320x240 in CPU, written it to RDRAM, pulled it back out of RDRAM, encoded it as a texture and drawn it back into the gamecube's framebuffer, efb_frame_708_b

Final 320x240px color copy to RDRAM. I have no idea why the 'remaining film' counter has been cropped. efb_frame_709

Note: All screenshots were dumped at 2x internal resolution, so they have 4x as many pixels as you would expect. I forgot to turn that off.

purplemarshmallow commented 8 years ago

Thanks this is really useful! There are plugins that can do accurate framebuffer emulation and copy every frame buffer to RDRAM (Glide64). Jabo's plugin even uses software rendering for framebuffer effects in Pokemon Snap. This makes me really wonder why the camera detection isn't working in any hardware renderer.

I know judging photos at the end of the level does not work with GLideN64 because depth buffer emulation is not accurate enough but it does work with Jabo's D3D and Glide64 (they use software depth buffer rendering)

project64 commented 8 years ago

Zelda is a complete mystery nobody knows why the CPU reads/writes in this game. Nobody knows what causes the subscreen delay and why filling the color buffer with white color (GLideN64's hack) fixes the subscreen delay.

the cpu is waiting till the rdp has finished rending the background/map before it continues with the rest of the graphics for the start page. It does not use the registers to check when it done, it polls the memory. The rdb/cheat hack in pj64 change the memory to white so that there is not the delay, if the plugin writes back the drawn frame to the memory than it also works.

phire commented 8 years ago

Those 8x8 pixel copies for pokemon snap virtual console are suspiciously small and at exactly the right location.

I think Nintendo has cheated and tuned the emulator for this one game. If it was a generic framebuffer to RDRAM copy it would copy the whole framebuffer each time. If their framebuffer to RDRAM copy implementation allowed partial copies based on what address was being read, then you would see it do 8x8px copies in other places.

It appears to know ahead of time if it's going to need the whole framebuffer or just a single 8x8px square.

purplemarshmallow commented 8 years ago

if the plugin writes back the drawn frame to the memory than it also works.

@Project64 No it does not work that's the mystery. The only thing that works is filling it with white color. Or angrylions pixel accurate plugin it's the only plugin that works correctly.

AmbientMalice commented 8 years ago

can you give more details Turok 3 works for me in LLE

I found the cause. LLE mode seems to hate anti-aliasing. With AA enabled, the game crashes at various points during the into cinematic.

purplemarshmallow commented 8 years ago

I think Nintendo has cheated and tuned the emulator for this one game. If it was a generic framebuffer to RDRAM copy it would copy the whole framebuffer each time. If their framebuffer to RDRAM copy implementation allowed partial copies based on what address was being read, then you would see it do 8x8px copies in other places.

@phire Nintendo has not cheated here. I debugged with GLideN64 and it shows that there are 8x8 auxiliary buffers. But there's something else. The scene is rendered in a special way. I noticed if you enable the copy from RDRAM in GLideN64 (it copies the image to RDRAM and then draws it on screen) only the objects that can trigger camera detection are visible. Not sure why this happens see #380

the cpu is waiting till the rdp has finished rending the background/map before it continues with the rest of the graphics for the start page. It does not use the registers to check when it done, it polls the memory. The rdb/cheat hack in pj64 change the memory to white so that there is not the delay, if the plugin writes back the drawn frame to the memory than it also works.

@project64 @cxd4 (because you found the workaround with copying white color) That's only half of the story. When I debugged this game I noticed something weird. The CPU reads from and writes to Links portrait in the equipment subscreen. It writes different values depending on what values are there. Not sure why this happens if the game just checks why are there writes? I think the subscreen delay is a lot more complex than that

purplemarshmallow commented 8 years ago

I found the cause. LLE mode seems to hate anti-aliasing. With AA enabled, the game crashes at various points during the into cinematic

explains why factor5 games dislike AA...

cxd4 commented 8 years ago

long thread

@project64 @cxd4 (because you found the workaround with copying white color)

who? me? Sounds familiar but I can't seem to remember.

I know I did report some research on the weird color buffer algorithm patterns with Pokemon Snap's camera validation, but with Zelda's FB it was a bit more direct as I'm pretty sure it was zilmar that did the 240p FB writeback code for that.

When I debugged this game I noticed something weird. The CPU reads from and writes to Links portrait in the equipment subscreen. It writes different values depending on what values are there.

I know it does.

Which part is weird? That a texture of Link and his equipment is drawn or that it's drawn based on what equipment is currently applied?

purplemarshmallow commented 8 years ago

The weird part is that the CPU modifies the portrait after it's drawn. With GLideN64 even if this texture is not written into RDRAM and I write some test values instead the CPU can still modify these values depending on which test values I wrote there

purplemarshmallow commented 8 years ago

I know I did report some research on the weird color buffer algorithm patterns with Pokemon Snap's camera validation

Also what's special about the color buffer in Pokemon Snap?

cxd4 commented 8 years ago

It's like some kind of weird bit-wise pattern??

You could alter just one or two of the CFB pixels, and the camera would detect the pokemon either as being on or off the screen for taking the picture, depending on what sorts of bit masks you set or inverted the pixel color by. I sent a bunch of notes I took to angrylion though that was a long time ago.

I'm sure anyone with the time and patience to reverse the game code would understand more.

gonetz commented 8 years ago

@phire "I do have a theoretical idea (and an experimental dolphin branch) for using the host's page tables to lock and trap accesses to ranges of ram (with a 4KB granularity) The plan would allow us to delay actually copying from the host GPU to main ram (and stalling the GPU) until the emulated cpu actually tries to read from the ram (by which point hopefully the GPU has finished rendering that buffer)"

Looks like 1964's FB extension: /** Function: FrameBufferRead Purpose: This function is called to notify the dll that the frame buffer memory is beening read at the given address. DLL should copy content from its render buffer to the frame buffer in N64 RDRAM DLL is responsible to maintain its own frame buffer memory addr list DLL should copy 4KB block content back to RDRAM frame buffer. Emulator should not call this function again if other memory is read within the same 4KB range input: addr rdram address val val size 1 = wxUint8, 2 = wxUint16, 4 = wxUint32 output: none ***/

@project64 "I have some idea that I might get to with being able to track memory pages that would allow me to detect writes to certain buffers, so I could detect the read/writes to those addresses, so curious how it would help and if it is even worth to try and see if it works."

It may help to speed up depth buffer emulation greatly. I suppose that CPU needs to probe only few values from the depth buffer to take decision about drawing some objects like coronas. If emulator will notify the plugin with FrameBufferRead that it needs to read 4kb from some address, plugin may load only that piece of data from GPU, not the whole image.

The same is true for color buffer copy. Some games use only part of the buffer for fb effect. Examples: Mario Kart, BAR. Plus, plugin will read nothing without the notification from the emulator, thus there will be no overheads for buffers reads. This notification allows Glide64+Mupen64 to emulate puzzle effect in Banjo Kazooie without any slowdown even on old graphics cards.

olivieryuyu commented 8 years ago

nice summary of the "remaining issues" of GlideN64. Is VI emulation perfect? Not totally sure of this.

Seems there is a lot of attention about FB. I hope futhering HLE bugs would also get some attention :)

LuigiBlood commented 8 years ago

If tracking memory pages can solve problems that would be so great.

zminhquanz commented 8 years ago

Oh , When will you update that on Android , is too long since 19.1.2014 , verson 2.4.4

gonetz commented 8 years ago

@zminhquanz : mupen64plue-ae project has nightly builds page. It includes up-to-date GLideN64.

purplemarshmallow commented 8 years ago

@olivieryuyu yes I'll edit it. Some VI features are not emulated like VI DAC filters. Still GLideN64 emulates the VI more than any other hardware renderer and I don't know how much sense it makes to emulate everything. Not sure if it's good to give HLE bugs a lot of attention. Maybe it's better to put this effort into optimizing LLE

LegendOfDragoon commented 8 years ago

@purplemarshmallow I think making HLE more accurate is very important and very rewarding.

purplemarshmallow commented 8 years ago

Maybe it's possible to fix small problems with HLE but I doubt anyone will ever fully reverse engineer the missing microcodes especially Factor5's microcodes.

olivieryuyu commented 8 years ago

RDP, HLE and LLE have equal important in my mind.

FB effects & RDP bugs should be a priority as it have a positive influence on both HLE and LLE.

For instance Vigilante 8 or Top Gear Hyperdrive games would be nice to get it fixed.

LegendOfDragoon commented 8 years ago

I don't expect anyone to HLE factor 5 and other games not yet HLE'd. I just think it will be nice to see some of these HLE bugs get fixed.

On a side note, is there any difference in functionality between 1964's FB extension and Mupen64's?

purplemarshmallow commented 8 years ago

1964's FB extensions work with rice video (with emulator fb option) Mupen64 has 1964's fb extensions implemented but they seem to work better with Mupen64. Glide64 can take advantage of Mupen64's implementation and it works correctly with multiple games but Glide64+1964 just crashes. But the specfication is the same I think.

LegendOfDragoon commented 8 years ago

1964's FB extensions work with rice video (with emulator fb option) Mupen64 has 1964's fb extensions implemented but they seem to work better with Mupen64. Glide64 can take advantage of Mupen64's implementation and it works correctly with multiple games but Glide64+1964 just crashes. But the specfication is the same I think.

Alright thanks. That's something worth looking into. I may try debugging 1964 with Glide64.

LegendOfDragoon commented 8 years ago

@purplemarshmallow

Glide64 can take advantage of Mupen64's implementation and it works correctly with multiple games

You mind naming a few games where it's useful? I'd like to look into learning about FB stuff.

Glide64+1964 just crashes

Since I love 1964, I'd like to try fixing this issue if possible. How do I reproduce the crashing?

purplemarshmallow commented 8 years ago

You mind naming a few games where it's useful? I'd like to look into learning about FB stuff.

You can look at the Glide64 compatibility list the games where it's useful have an entry ("Get frame buffer info" with Mupen) https://github.com/gonetz/glidehqplusglitch64/blob/master/Glide64/Help/Glide64%20compatibility%20list.html

Since I love 1964, I'd like to try fixing this issue if possible. How do I reproduce the crashing?

In the 1964 rom properties you have to enable the "Framebuffer R/W" option. In the Glide64 settings you have to enable the "Get frame buffer info" option. It's strange the configuration dialog in Glide64 warns you about enabling this option with 1964. But when I tried it it actually worked correctly! The jigsaw effects in Banjo Kazooie showed up and the images in the Bomberman64 intro showed up. I'll do some testing with 1964 and my GLideN64 branch https://github.com/purplemarshmallow/GLideN64/tree/fb-info

LegendOfDragoon commented 8 years ago

@purplemarshmallow thanks! So far, I tested Mario kart with Rice Video & Glide64 Final and the FB emulation seems to work fine in 1964.

ADormant commented 8 years ago

@purplemarshmallow @gonetz Perhaps pixel coverage can be emulated with conservative rasterization?

https://www.opengl.org/registry/specs/NV/conservative_raster.txt https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_fragment_shader_interlock.txt For programmable shader blending you can use texture barier. https://www.opengl.org/registry/specs/ARB/texture_barrier.txt https://www.opengl.org/registry/specs/ARB/blend_func_extended.txt

gonetz commented 8 years ago

@ADormant thanks for the links! It is worth to try. The question is who will do it. I currently have only few free hours per week and hundreds of more urgent tickets to fix.

purplemarshmallow commented 8 years ago

Looks interesting but I fear I don't have enough skills for such a task

weinerschnitzel commented 8 years ago

@gonetz It's my opinion that many of these bugs are due to a sort of domino effect of certain features not being emulated or being emulated incompletely. I would think many of these open tickets can be solved or at least have their obscure behavior eliminated to where they can be solved if some of these listed goals are met. I would like to ask that you shift your priorities to these bigger, not so low-hanging, goals and delegate the rest. Even if it means less frequent progress at a snail's pace.

I believe @purplemarshmallow is competent enough to carry on bug bashing in the meantime, especially with the feedback that he receives from everyone else interested enough to do some research.

pixel coverage and Frame/Depth/Color buffer info from the emulator would be my top wishes.

gonetz commented 8 years ago

@project64 I started to implement 1964's FB extension functions. Depth buffer read is ready. Now I can get detailed answer on your question "how it would help and if it is even worth to try and see if it works."

As I expected, Zeldas do not need to probe the whole depth buffer to take decision about coronas drawing. I got 1-4 FBRead calls per frame with addresses inside the depth buffer. Each call requires to write 4kb of data to RDRAM, starting from the address. The whole buffer is 150kb. That is I need to read less than 15% of the buffer to get that effect working. Of course, each access to video memory is costly and 20 reads of small parts will be much slower than one read of the whole buffer. Thus I made some measurements. Whole buffer read takes circa 8000 microseconds on my PC. One read of 2048 pixels (depth buffer is 16bit) takes circa 1000 microseconds. That is depth buffer emulation with FB notification is 2-8 times faster than the current method. Also, buffer read only when necessary with FB notification, while currently it is read each frame.

Tested with Mupen64.

purplemarshmallow commented 8 years ago

These are very good results. I think for the color buffer it's possibly better to always copy the whole buffer like Glide64. In most cases the CPU will read the whole buffer. The speed gain should be even bigger here.

AmbientMalice commented 8 years ago

Fantastic work. The only big problem is PJ64's current lack of notification support.