gonetz / GLideN64

A new generation, open-source graphics plugin for N64 emulators.
Other
770 stars 178 forks source link

Regression in Zelda MM after #2537 #2542

Closed gonetz closed 2 years ago

gonetz commented 3 years ago

Before #2537 SD GLideN64_ZELDA_MAJORA'S_MASK_000

HD GLideN64_ZELDA_MAJORA'S_MASK_001

After SD GLideN64_ZELDA_MAJORA'S_MASK_004

HD GLideN64_ZELDA_MAJORA'S_MASK_003

@standard-two-simplex please take a look. I reverted 73b2d0060bd.

ghost commented 3 years ago

I have been checking LOD the last few days. I spotted a bug in sharpen interpolation, which is causing Space Station Silicon Valley to display a less detailed texture when magnifying.

https://github.com/gonetz/GLideN64/blob/c36ffa97f7f6780de395cdfe1ce6735cf590b204/src/Graphics/OpenGLContext/GLSL/glsl_CombinerProgramBuilder.cpp#L2189-L2190

It should be lod_frac - 1.0 going to negative values to correctly extrapolate. Current code causes an lod near 0.0 to fetch from the second most detailed texture, because lod_frac becomes almost 1.0. Remember that the LOD combiner equation is

(TEXEL1 - TEXEL0) * LOD_FRAC + TEXEL0

I'm not sure about the black texture. It seems it also happens in pre-merge builds, but only at the far end. It is possibly due to LOD being too big, which happens more easily with the new formula.

I rewrote LOD code to try to make it more clear. One of the biggest difficulties was determining lod_frac behaviour when magnifying or distant. Usually it shouldn't matter (unless sharpen or detail is on) because it mixes the most detailed texture with itself (magnifying) or the least detailed texture with itself (distant). However, zelda's blue warp textures behave strangely. LOD is disabled, but lod_frac is used in the combiner. Apparently, it must be 1 in order to display the effect correctly.

https://github.com/standard-two-simplex/GLideN64/tree/lod_refactor This is the picture I got so far. It seems to me that the correct level is selected, but the farthest part is not correctly filtered. I don't have the HD textures to test. zelda_majora's_mask-001

There is a likelyhood that it is because this lod_scale. https://github.com/gonetz/GLideN64/blob/c36ffa97f7f6780de395cdfe1ce6735cf590b204/src/Graphics/OpenGLContext/GLSL/glsl_CombinerProgramBuilder.cpp#L2142-L2146

I wrote that when introducing the texture engine code, in order not to break LOD completely. It computes the texture coorinates for the most detailed texture, then divides by a factor. However, a fractional part is lost in the process.

In order to correctly emulate LOD, it is necessary to apply correct scaling shift in the texture engine, which means loading all tile attributes and selecting the correct one based on lod_tile. I explained the issue in #2320, and I don't think it is an easy task.

ghost commented 3 years ago

Actually, I think the problem is similar to the one in the swamp. #2315

The texture has 4 levels, and the shifts are 0,1,2,2 respectively. So the least detailed texture is not half the previous size. This causes part of the texture to be missing, thus the strange pattern.

ghost commented 3 years ago

I explained the issue in #2320.

I tried to make an initial implementation of this approach. I got some working code, but with some glitches. I think that this scene renders correctly though (ignore missing minimap), with one issue only.

zelda_majora's_mask-004

The problem probably is that part of the texture is missing. Since the shift doesn't increase for the least detailed texture, texture coordinates are the same as the next-to-least detailed texture. However, this provokes an out of bound fetch, because the texture should be bigger.

I don't understand the piece of code that loads textures to OpenGL, so I don't think I can make much progress.

I updated the branch with a simplified implementation, which tries to avoid branching as much as possible for shader performance, and works in a similar way to current master. Majora's Mask is not perfect, but it is the best I can get, without the full textures.

For the case when LOD is off, I defaulted lod_frac to 1.0. I don't know if this is accurate, but AngryLion's reference code does messy computations just for edge cases. I'm also a bit skeptical about them.

I also noticed that when detail mode is enabled, one texture level is missing. I wrote a hotfix but I don't know if that is correct.

@gonetz Please, review the code when you find the time and tell me what you think about it.

gonetz commented 3 years ago

Actually, I think the problem is similar to the one in the swamp. #2315

The texture has 4 levels, and the shifts are 0,1,2,2 respectively. So the least detailed texture is not half the previous size. This causes part of the texture to be missing, thus the strange pattern.

True. Texture sizes are 32x32, 16x16, 8x8, 8x8. The last two tiles are the same, they are loaded from the same address. I don't understand, why do it? Why not just lower mip-map level? Blending tile with itself using LOD should be the same as just fetching from this tile. If this texture had 3 levels, it had to work the same. Did I miss something?

I tried to check this idea and just reduce the level of such textures, patch: 0001-Check-for-equal-sized-tiles-of-mip-mapped-texture.zip

It has no visible effect for the current master. I applied commit 73b2d0060 (Change LOD calculation formula) and got this: GLideN64_ZELDA_MAJORA'S_MASK_009 This is pretty close to the pattern I see with AL plugin: NZSE0000

HD textures also work ok with this patch, no black pixels.

For the case when LOD is off, I defaulted lod_frac to 1.0. I don't know if this is accurate, but AngryLion's reference code does messy computations just for edge cases. I'm also a bit skeptical about them.

As I remember from the AngryLion's reference code, LOD is always calculated, even when it is off. My code does the same: if LOD is off (uEnableLod == 0) it calculates lod_frac but does not change the current tile, that is T0 is the first tile (gSP.texture.tile), T1 is the next one.

I also noticed that when detail mode is enabled, one texture level is missing. I wrote a hotfix but I don't know if that is correct.

I'll check it.

Please, review the code when you find the time and tell me what you think about it.

Sure, I'll check it.

Jj0YzL5nvJ commented 3 years ago

Is this relevant? ata4/angrylion-rdp-plus#121

ghost commented 3 years ago

I don't understand, why do it? Why not just lower mip-map level? Blending tile with itself using LOD should be the same as just fetching from this tile. If this texture had 3 levels, it had to work the same.

Yes, it would work the same. I don't see a good reason to use four levels and repeat one.

if LOD is off (uEnableLod == 0) it calculates lod_frac but does not change the current tile

Yes, I saw this. However, the blue warp works because a very odd behaviour. The maximum level is set up to 0, so angrylion's code considers everything to be distant, even when magnifying. In this case, it sets up lod_frac = 1.0. I say that this behaviour is odd because there is a jump in the lod -> lod_tile + lod_frac function, which is meant to approximate a logarithm in base 2.

For example, if max level is 3 and sharpen and detail are disabled, then and lod = 7.9 would result in lod_tile = 2, lod_frac =0.975, but lod = 8 would result in lod_tile=3 and lod_frac=1.0. I.e, lod barely changed but there is a 1 unit jump in lod_tile + lod_frac.

For a normal case (i.e., LOD enabled and the combiner set to G_CC_TRILERP) it shouldn't matter, because the distant case means mixing the least detailed texture with itself, so lod_frac has no influence. In a strange situation, however, like the blue warp, lod_frac of the distant case matters.

Is this relevant? ata4/angrylion-rdp-plus#121

No, here we are talking about lod_frac, which is the value calculated on a per pixel basis by the rasterizer. The prim_lod_frac is a register configurable by the game programmer, so it is straightforward to read the correct value.

gonetz commented 3 years ago

@standard-two-simplex I'm still trying to understand your LOD modifications. It is not a simple matter.

First, the modification for the "fake mipmap" Master

if (uMaxTile == 0) return 1.0;
return uMinLod;

Branch

if (uEnableLod == 0) return 1.0;
return 0.0;

Could you explain it? If LOD is enabled, use Tile1 otherwise use Tile0? Why it is better than previous version?

Then, max texture level. Master _pTexture->max_level = static_cast<u8>(gSP.texture.level - 1); Branch

const u8 max_level = gDP.otherMode.textureDetail == G_TD_DETAIL ? gSP.texture.level : gSP.texture.level - 1;
_pTexture->max_level = static_cast<u8>(max_level);

Are you sure that gSP.texture.level is max level without detailed texture? Could you cite N64 dev manual, where it is explained? Btw, I tried to check how Perfect Dark set textures in G_TD_DETAIL mode and found a weird case: texture sizes are 32x32 (detailed) 64x64 64x64 32x32 16x16 8x8 That is first and second tiles of the second texture are the same. Another case, which can't be represented with OpenGL mip-mapped texture. gSP.texture.level is 4 here, so yes, 4 is max zero-based level for the second texture.

Next, regarding LOD enabled/disabled. In your branch you calculate lod_frac only if LOD is enabled in othermode. I'm still doubt that it is correct, especially when we know that angrylion's code works different. You explained a jump in the lod -> lod_tile + lod_frac function, but as I understand it works like this only if LOD is off and max level > 1 Do you think that situation is possible?

Regarding the main code: you changed it so much that I barely can understand is it correct now or not. Especially after you changed uEnableLod related logic.

ghost commented 3 years ago

Could you explain it? If LOD is enabled, use Tile1 otherwise use Tile0? Why it is better than previous version?

It is not necesarily better. I just put the same behaviour as with no faking mip-map. The blue warp requires lod_frac=1.0, so I defaulted to it in both branches. Otherwise, in order to use the most tiled texture, I chose lod_frac = 0.0, so that the combiner chooses texel 0.

Are you sure that gSP.texture.level is max level without detailed texture?

No, without detail texture it is still gSP.texture.level - 1, as before. I believe my modification keeps that case the same. You can check figure 13-5 for detail disabled and figure 13-7 for detail enabled. http://n64devkit.square7.ch/pro-man/pro13/13-07.htm#03

Another case, which can't be represented with OpenGL mip-mapped texture.

Yes. The N64 allowed this sort of mip-mapping, which does not map well to OpenGL's mip-mapping. The only good solution I see is loading up to 8 independent textures uTex0, ..., uTex7 rather than one regular texture uTex0 and one mip-mapped texture uTex1.

You explained a jump in the lod -> lod_tile + lod_frac function, but as I understand it works like this only if LOD is off and max level > 1

AngryLion's code has this jump if LOD is both on or off, because it calculates it anyway. It happens if max_level >= 1. If max_level==0 and sharpen and detail are disabled, it alway considers to be distant and defaults lod_frac=0xff.

My implementation only differs from AngryLion's in the distant case. Rather than assuming lod_frac=1.0 I gradually increase it as with non distant cases. In normal circumstances this shouldn't matter. If LOD is enabled and the combiner is G_CC_TRILERP, i.e.

(TEXEL1 - TEXEL0)*LOD_FRAC + TEXEL0

texel 0 and 1 will fetch from the least detailed texture. Therefore, they'll contain the same color, the subtraction will be 0 and lod_frac won't matter. I wrote is as it is, because it simplifies the algorithm a lot. Only two ifs are needed: one to adjust tile indices in detailed non-magnifying mode and one to adjust lod_frac to sharpen extrapolation and non-magnifying cases.

If games use LOD in some other way, it is probably a game bug. It may happent that they give correct picture as in Zelda, but they are probably edge cases, not very likely to happen in games.

ghost commented 3 years ago

I pushed nother version of the algorithm. It is closer to Angrylion's implementation. Only the distant case changes slightly, where I clamp the lod to 2^lod_tile - 1/32. Here, 1/32 is meant to be the smalles possible change in fixed point format (not sure if the fractional part uses 5 bits).

Edit: I slightly modified the implementation.

weinerschnitzel commented 3 years ago

Is there much performance difference? It looks like the new way has 1 more branch than the other. If there is enough performance difference, probably best to use less branches version so low end devices remain performant. If it is negligible, probably best to keep structure close to angrylion.

gonetz commented 3 years ago

Yes. The N64 allowed this sort of mip-mapping, which does not map well to OpenGL's mip-mapping. The only good solution I see is loading up to 8 independent textures uTex0, ..., uTex7 rather than one regular texture uTex0 and one mip-mapped texture uTex1.

Yes, it would be the best, but such solution will be incompatible with some mobile devices, which supports only 8 texture samplers. I also thought to use texture array, but "texture array is a collection of same size/format/flags 2D textures". How to put mip-map tiles to the array? Magnify? How to magnify tiles, whose size is not half of the previous tile size?

gonetz commented 3 years ago

@standard-two-simplex I tested your lod_refactor branch a bit. Something is wrong with tiles transition in detailed mode. Look at the carpet master https://drive.google.com/file/d/1AnYEyM4O01TIoR4PamvCu8co5aUgAMdz/view?usp=sharing lod_refactor https://drive.google.com/file/d/1ct1MBwRuy9uAJYJjiMD5-lpsBG_kePM-/view?usp=sharing

save file for pj64 https://drive.google.com/file/d/1rC87D4dU1PiSzcVV6p6D0UfS9-qP0TeH/view?usp=sharing

ghost commented 3 years ago

Is there much performance difference?

I don't notice any performance difference on my old laptop. Maybe on mobile devices.

Yes, it would be the best, but such solution will be incompatible with some mobile devices, which supports only 8 texture samplers.

That is a bummer.

I also thought to use texture array, but "texture array is a collection of same size/format/flags 2D textures". How to put mip-map tiles to the array? Magnify? How to magnify tiles, whose size is not half of the previous tile size?

There's no need to magnify, you can make the textures as big as the biggest N64 texture and fill the unnecessary pixels with zeroes. The biggest problem probably comes with format. You're gonna need one texture array per format, which is a big inconvenient.

There is also the possibility of using a texture atlas. The N64 does this with its TMEM. It should be possible to upload a 1D texture and write custom sampling functions. Something like,

lowp vec4 mTexelFetch(int s, int t, int width, int offset) {
    return texelFetch(uTex1D, offset + t*width + s, 0);
} 

However, the format issue would still exist. You would need one texture atlas per format used.

The only way I see to overcome the different format issue is to load TMEM as raw data, i.e., as a GL_R8UI, GL_R16UI or GL_R32UI texture, perform the texel fetch, then convert it into RGBA32 in the fragment shader for combiner operations. It is probably a very big task, though.

ghost commented 3 years ago

Something is wrong with tiles transition in detailed mode.

I checked this case. Unfortunately, it is not due to being a detail texture, it is the same issue as the original one reported here. Level 0 (primtile + 1) and Level 1 (primtile + 2) have the same sizes, so when the lod becomes big enough, part of the texture is missing. The master branch doesn't show an issue because max(dx.x, dx.y) doesn't become big enough.

Current code "stretches" the available texture to work as the most detailed one. The black pixels are generated when fetching outside of the existing texture without stretching. As you can see, only the top left part of the texture is available, when Level 1 or bigger are used.

StretchedNot stretched
gonetz commented 3 years ago

There's no need to magnify, you can make the textures as big as the biggest N64 texture and fill the unnecessary pixels with zeroes.

How it will work? Texture coordinate addresses the whole texture. For example, we have tiles 32x32, 16x16. Create texture array with 32x32 textures. Texture coordinate (0.9, 0.9) will point to zeroed part of the second texture. That is, we need to correct the texture coordinate before fetching from the second layer of the array.

The biggest problem probably comes with format. You're gonna need one texture array per format, which is a big inconvenient.

I don't understand, where is the problem here? Mip-map tiles have the same format. One texture array per mip-mapped texture.

There is also the possibility of using a texture atlas. The N64 does this with its TMEM. It should be possible to upload a 1D texture and write custom sampling functions. Something like,

lowp vec4 mTexelFetch(int s, int t, int width, int offset) {
    return texelFetch(uTex1D, offset + t*width + s, 0);
} 

Hmm, it is probably not a bad idea. We have 4096 texels at most, it does not exceed max texture size. All issues with sizes of mip-map tiles will gone automatically. I like it, worth a try.

However, the format issue would still exist. You would need one texture atlas per format used.

Again, I don't see why format is a problem.

I checked this case. Unfortunately, it is not due to being a detail texture, it is the same issue as the original one reported here. Level 0 (primtile + 1) and Level 1 (primtile + 2) have the same sizes, so when the lod becomes big enough, part of the texture is missing.

Ok, I did not notice that. I started to think, how to avoid that problem with the current mechanism of mip-mapping. All solutions, which comes to my mind, look quite ugly. The more I think about your idea with texture atlas, the more I like it. I'll try to implement it. In case of success your refactored lod calculation will probably shine with it.

ghost commented 3 years ago

Texture coordinate (0.9, 0.9) will point to zeroed part of the second texture. That is, we need to correct the texture coordinate before fetching from the second layer of the array.

Texture coordinates generated will normally not exceed texture sizes after clamp wrap and mirror. So (0.9,0.9) will never be generated.

I don't understand, where is the problem here?

Suppose we have a situation where tile0 is loaded 16 bit RGBA, tile 1 is loaded as 8 bit IA. How do you create the atlas? Do you convert all texels into rgba32 before loading to OpenGL?

gonetz commented 3 years ago

Texture coordinates generated will normally not exceed texture sizes after clamp wrap and mirror. So (0.9,0.9) will never be generated.

Texture coordinates generated for the first tile. I checked the current code, it has texture coordinates correction:

"  lowp vec2 lod_scale = vec2(textureSize(tex,int(lod))) / vec2(textureSize(tex,0));            \\\n"
"  lowp vec4 c00 = texelFetch(tex, ivec2(tcData[0]*lod_scale), int(lod));                       \\\n"

Such correction is also necessary for texture array or texture atlas.

Suppose we have a situation where tile0 is loaded 16 bit RGBA, tile 1 is loaded as 8 bit IA. How do you create the atlas? Do you convert all texels into rgba32 before loading to OpenGL?

This situation is possible only for detail mode. Detail texture (tile0) is loaded as normal texture. The rest is loaded as atlas. I don't remember a situation where mip-map levels have different format. Conversion all texels into rgba32 is also a solution.

ghost commented 3 years ago

Such correction is also necessary for texture array or texture atlas.

Yes, you're right. I missed textureSize() will no longer be available. It will be necessary to upload at least tile shift information to shaders.

gonetz commented 3 years ago

@standard-two-simplex I've implemented load of mip-map tiles into 1D texture atlas (actually 2D with height 1). I hardly believe it, but it works! I removed all my hacks, which were necessary to squeeze N64 tiles into OpenGL mip-mapped texture. I put it to mip_map_load_refactor branch. It is based on the current master. I took only max level fix and lod calculation fix from your branch. Please put the rest, as you think it should be. I may update the branch as I think that some bugs may hide somewhere, but the main code should remain the same.

We have 4096 texels at most

I was wrong there. The carpet texture has tiles 64x64, 64x64, 32x32, 16x16 and a detail tile. First two tiles are the same (loaded from the same tmem address), but anyway total number of texels is over 5000. My hardware supports textures of 16k size, so it is not a problem for me, but I guess that mobile hardware is not that capable. @fzurita , could you test your devices for max texture size?

fzurita commented 3 years ago

Yeah, sure. Once I get a chance I'll test it,

gonetz commented 3 years ago

Thanks

fzurita commented 3 years ago

Am I understanding this correctly? According to this, all devices will support at least 2048x2048

https://opengles.gpuinfo.org/displaycapability.php?name=GL_MAX_TEXTURE_SIZE&esversion=2

2048x2048 texture size is 4194304 pixels.

64x64 + 64x64 + 32x32 +16x16 is only 9472 pixels.

So even in the lowest specifications that should be supported.

Am I going wrong somewhere?

fzurita commented 3 years ago

What is the easiest way to verify the change is working correctly? Shaders do seem to compile fine, can I get a save state for mupen64plus?

gonetz commented 3 years ago

Am I going wrong somewhere?

I load all mip-map tiles as one-dimensional array of data, one by one. For example, tile of size 32x32 is represented as array of 1024 texels. Offset from the start to each tile in that array is known, so texel fetch looks looks this

lowp vec4 mTexelFetch(int s, int t, int width, int offset) {
    return texelFetch(uTex1D, offset + t*width + s, 0);
} 

However, as I said, in some cases total number of texels in all mip-maps exceeds 4096, and in most cases it exceeds 2048. Thus, such texture can't be loaded as one-dimensional array on mobile devices. Yes, it can be loaded as 2D texture with height greater than 1, but texel fetching will be a bit more complicated and thus slower.

What is the easiest way to verify the change is working correctly? Shaders do seem to compile fine, can I get a save state for mupen64plus?

I'm not sure that you will get OpenGL error if requested texture size is greater than maximal one. Most likely texture will just load incompletely. In that case some glitches are possible. Unfortunately, I don't have PD save state for mupen64plus. You may load PD or GE from any place. Both games use mip-mapping almost everywhere, so at least you may check that it works. Note: I did not correct mip-mapping for GLES 2.0 yet, it will not work on such devices.

Information about 2048x204 or 4096x4096 texture size limitation can be obsolete. Probably all devices in use have larger texture limit. Could you check that limit on your weakest devices? There is function Context::getMaxTextureSize(). It is used in TextureFilterHandler::init(). You may log its result.

fzurita commented 3 years ago

Got it, I see. Because all data is stored in 1 dimension, we are limited by a single dimension in the max texture size dimension.

Just by looking here: https://opengles.gpuinfo.org/displaycapability.php?name=GL_MAX_TEXTURE_SIZE&esversion=2 The only modern device that are going to have issues are going to be Power VR devices. I do have such a device, which is the Nexus Player. I will try to see what happens there. There are many older Adreno devices that have lower max texture sizes, unfortunately, I don't any of those.

Worst case scenario, can we load the lowest mipmap level if device can't support loading texels past the maximum size? I could see it causing issues with accuracy but it may be better than showing just black or some other junk.

fzurita commented 3 years ago

Just judging by golden eye, I can tell that all the textures are less detailed. I'm assuming that mipmapping was not working correctly in golden eye before and now it is. People may actually complain due to the lower texture detail due to the improved accuracy, lol.

fzurita commented 3 years ago

This is potentially a really good change, a least for goldeneye. I have not ruled out a setting difference, but I'm seeing about double the performance in goldeneye with this change? I don't understand how it could make such a huge difference.

Edit: About double the performance in Yoshi's story. This would be a really good change for GLES 2.0 devices.

Edit: Ok, forget the performance improvement, it's most likely a configuration difference, I'm trying to track it down.

Edit: Ok, it's actually about 10% slower than the master branch.

ghost commented 3 years ago

There's an issue in BAR. The Volkswagen logo and traffic sign are incorrect. beetle_adventure_rac-001 mupen64plus savestate: state.zip

gonetz commented 3 years ago

Just judging by golden eye, I can tell that all the textures are less detailed. I'm assuming that mipmapping was not working correctly in golden eye before and now it is.

Yes, it is possible. New mipmapping loading should not noticeably change the work of mip-mapping, but I also applied
standard-two-simplex's corrections for LOD calculation and for max mip-map level. These fixes can change the picture.

I'm seeing about double the performance in goldeneye with this change?

Actually, it should be slower, because mipmap shader need to do more calculations to fetch a texel and two additional fetches to get tile size and offset.

gonetz commented 3 years ago

There's an issue in BAR. The Volkswagen logo and traffic sign are incorrect.

GLideN64_Beetle_Adventure_Rac_004

Looks like this with PJ64

ghost commented 3 years ago

It's weird. I see it correctly with PJ64 32 bit, and incorrectly with mupen64plus 64 bit. I don't know yet about mupen64plus 32 bit.

fzurita commented 3 years ago

@gonetz It is slower, about 10% slower. I'm trying to figure out why I have double the performance in my debug builds compared to my release builds...

Edit: Ok, my new Samsung phone recognizes when my app is running and it's throttling CPU performance on my playstore builds. That explains the performance difference I saw. Either way, it's about 10% slower when comparing apples to apples FPS wise.

ghost commented 3 years ago

There's an issue in BAR. The Volkswagen logo and traffic sign are incorrect.

In my end, it is correct with PJ64 and mupen64plus 32bit builds, but incorrect with mupen64plus 64bit builds. It would be good if someone could confirm.

I rebased the refactored tile selection and lod fraction calculation code. https://github.com/standard-two-simplex/GLideN64/tree/mip_map_load_refactor I forked before some changes to uMaxTile. I can explain the algorithm if it helps.

ghost commented 3 years ago

The grass court in Mario Tennis is not working. I recall the grass texture was smaller than expected, which was fixed by wrapping around when running out of texels. It might not have been the correct solution though.

bslenul commented 3 years ago

It would be good if someone could confirm.

Yup (back of the car + the "arch" or whatever it's called, the fences on the sides have a much lighter color as well):

32bit 64bit
image image
ghost commented 3 years ago

@gonetz One question about LOD off mode. Is it supposed to be an enhancement or a speed hack? I realize it is possible to force lod to be 1.0 and emulate tile selection and lod_frac generation. It will select the most detailed texture and emulate edge cases like the blue warp in zelda, at the cost of performing most lod computations.

See https://github.com/standard-two-simplex/GLideN64/commit/cafba9936e06225de99dde48ffd7a0d7607e0e60 for proof of concept.

gonetz commented 3 years ago

I rebased the refactored tile selection and lod fraction calculation code.

Cool! I'll try to check it asap.

The grass court in Mario Tennis is not working. I recall the grass texture was smaller than expected, which was fixed by wrapping around when running out of texels. It might not have been the correct solution though.

I don't remember this at all :( Ok, another issue to investigate.

One question about LOD off mode. Is it supposed to be an enhancement or a speed hack?

It is rather an enhancement. Some users dislike how games look with low-res mip-mapped tiles. Also, mip-mapping does not work with HD textures, so Nerrel advises to switch mip-mapping on when his texture pack is used.

It will select the most detailed texture and emulate edge cases like the blue warp in zelda, at the cost of performing most lod computations.

Sounds as a good idea. I'll check it, thanks!

gonetz commented 3 years ago

Yup (back of the car + the "arch" or whatever it's called, the fences on the sides have a much lighter color as well):

Confirmed. Weird regression on 64bit version.

gonetz commented 3 years ago

rebased the refactored tile selection and lod fraction calculation code. https://github.com/standard-two-simplex/GLideN64/tree/mip_map_load_refactor I forked before some changes to uMaxTile. I can explain the algorithm if it helps.

I cherry-picked your commit to the updated mip_map_load_refactor branch. I did not found new issues. I haven't checked grass court in Mario Tennis issue yet. An explanation of the algorithm would be really great. Could you add it to Wiki page?

gonetz commented 3 years ago

Btw, the issue with BAR and 64bit version is fixed. It was a bug in texture conversion to rgba32. A 16bit format was used before the lod fixes, so nobody noticed that bug.

gonetz commented 3 years ago

See standard-two-simplex@cafba99 for proof of concept.

I checked that commit. It looks good to me. Please clean-up and make a PR, or add to your mip_map_load_refactor branch

ghost commented 3 years ago

Please clean-up and make a PR, or add to your mip_map_load_refactor branch

Here, 06446b52777c934f441306c16ec022ca24ac7152

gonetz commented 3 years ago

Here, 06446b5

Thanks!

gonetz commented 3 years ago

Court grass in Mario Tennis intro is another crazy invention of Nintendo programmers. LOD: on, mipmap_lvl: 7 mode: Sharpen The combiner does not use lod_frac (TEXEL0 - 0) * TEXEL1 + PRIMITIVE Thus, LOD affects only tiles to fetch texels from.

tile 0 is 32x32 tile 1 is 128x16 Tiles 2, 4, 6 are copy of tile 0 - the same size and tmem address Tiles 3, 5, 7 are copy of tile 1 - the same size and tmem address Why do that? Why not just use t0 and t1? And how it should be emulated? It works in master only by chance: size of mipmap layers is calculated by dividing size of tile 1 by 2. So, tile 1 is loaded correctly, tile 2 is loaded as 64x8 instead of 32x32, tile 3 is loaded as 32x4 instead of 128x16 and so on. Nevertheless it somehow works and looks correctly.

It won't work with new mipmap loading. To make it working we need to do clamp-wrap-mirror for each mip-map level. It is very expensive. As a workaround I just disabled mip-mapping if the current combiner does not use lod_frac.

gonetz commented 3 years ago

I updated mipmap texture loading: instead of load all levels as texture atlas of size [TotalTexels, 1] I load the atlas as 2D texture of size [SomeWidth, TotalTexels / SomeWidth] SomeWidth is a number, which is for sure less than max tile size, 256 in my implementation. Now it should work on any device. However, it requires additional calculations, so it is even slower than before. @fzurita, please test if it is bearable for mobile devices.

gonetz commented 3 years ago

To do: adapt GLES 2.0 mipmap shader.

fzurita commented 3 years ago

I'm sure it will be bearable, but the requirements to run the plugin are creeping higher and higher. I will probably end up creating a legacy GLideN64 for mobile devices based on the older version before this pull request: https://github.com/gonetz/GLideN64/pull/2202

I will still include the newer version for devices that are quick enough to run it.

ghost commented 3 years ago

Nevertheless it somehow works and looks correctly.

bUseLod in buildCombinerProgram() only checks for the usage of lod_frac in the combiner, so the texels are being fetched from tiles 0 and 1.

Why do that? Why not just use t0 and t1?

I dumped all the tile attributes. Tiles 1,3,5,7 are identical as you said. Tiles 0,2,4,6 are slightly different. They are 4 bit color indexed textures, and their palette indices are 0,1,2,3 respectively. The shift parameter also changes, which may make the farther textures look a little more shriked.

Still, using t0 and t1 probably gives nearly the same picture as accurately rendering the scene.

To make it working we need to do clamp-wrap-mirror for each mip-map level.

Sort of. It is enough to perform clamp-wrap-mirror for the two desired indices only, but this indices will change per pixel. I had started a branch where I tried to do things accurately, but it is incomplete and lots of work needs to be done yet. The idea is to pass correct tile indices to textureEngine(). Still, I publish it for proof of concept. 056d33d9d0cd6dac74b7195acda5d9dea26a1423

As a workaround I just disabled mip-mapping if the current combiner does not use lod_frac.

I agree, it is probably for the best.

gonetz commented 3 years ago

I dumped all the tile attributes. Tiles 1,3,5,7 are identical as you said. Tiles 0,2,4,6 are slightly different. They are 4 bit color indexed textures, and their palette indices are 0,1,2,3 respectively. The shift parameter also changes, which may make the farther textures look a little more shriked.

Cool! I did not notice that. Thanks for the explanation. So, this really an invention.

gonetz commented 3 years ago

To do: adapt GLES 2.0 mipmap shader.

Done: 6274baaae

@fzurita I made these changes blindly. I have no devices to check it. Please test it and correct if necessary.