Arisotura / blargSnes

SNES emulator for the 3DS.
193 stars 33 forks source link

[veryhard] Messed up sound & scaling with mode 7 filter #30

Open profi200 opened 8 years ago

profi200 commented 8 years ago

System: New 3DS XL on 10.4.0-29E. Build: CIA build from latest source (veryhard) with latest ctrulib from Github. devkitARM r45.

Problem(s): The mode 7 filter makes the bg look very messed up like it got scaled 50 times or something. Here is a screenshot of it: https://i.imgur.com/SsyWy0D.png

And the bigger problem is if hw rendering is enabled the sound misses notes. With sw rendering the sound is fine. @fincs can confirm the same problem.

DiscostewSM commented 8 years ago

Regarding the filtering, are you talking about the resulting black outlines? This is an unfortunate side-effect when using 15-bit textures with a 1-bit alpha that contains "holes". Even in spots that are empty, there is still a color associated with each texel, and the linear interpolation takes that into account, but can't interpolate the alpha because it is 1-bit. The filter was just for experimental reasons, and may be removed in later builds. Seems fine for games that don't have such "cut-out" designs

I cannot test the problem with audio because I don't own an n3DS. It seems fine on an o3DS, so just for testing purposes, please comment out the line in the main function in main.c where it enables the speed up to 804Mhz, and report back if any chance happens regarding audio. Thank you.

profi200 commented 8 years ago

I can confirm disabling L2 cache and setting it back to 268 MHz fixes the sound problem if hw rendering is enabled. But the problem is in BlargSnes or ctrulib. The speedup is really useful and i would not entirely disable it.

And btw i made an updated .rsf for BlargSnes: https://dl.dropboxusercontent.com/s/4cl8xaka336bp41/cia.rsf

On new 3DS you can access an entire second ARM11 core. I would use that instead of the system core.

edit: Now i get that notes missing problem even with sw rendering. Weird. But it is a lot less often than with the hw renderer.

DiscostewSM commented 8 years ago

Let's assume it's because of the dsp mixer processing on the core used by the system/OS on the n3DS. In main.c, there is a function called StartROM, and in it a line that shows the following...

spcthread = threadCreate(SPCThread, 0x0, SPC_THREAD_STACK_SIZE, 0x18, 1, true);

The second-to-last parameter (which shows "1") indicates the core used to run the dsp mixer. On o3DS, only 0 and 1 are valid outside of using -1 and -2 (former means any core and latter is one designated by the default CPU, read from the Exheader). The n3DS also has core 2 and 3. See if the problem persists with speed up enabled, and the thread is assigned to 2 or 3.

Arisotura commented 8 years ago

maybe the sound issue is because the CPU/SPC loop is running too fast, causing it to swap DSP buffers before the DSP is done mixing or something like that? could try adding more safety there, but can't exactly test, no new3DS...

"Even in spots that are empty, there is still a color associated with each texel, and the linear interpolation takes that into account, but can't interpolate the alpha because it is 1-bit" it probably can, textures are surely converted to a 32bit representation before the GPU works with them

I have already seen that kind of issue though, iirc the fences and shit in SM64; the interpolation algorithm doesn't really know what to do with the alpha apparently

fincs commented 8 years ago

New 3DS cores 2 and 3 are not accessible to homebrew by normal means.

profi200 commented 8 years ago

Yeah, ok. The core 2 idea is not good then.

DiscostewSM commented 8 years ago

Well, how about for science, try the dsp mixer thread on core 0. That one single core has more processing availability than o3DS has combined.

But I think StapleButter may be right. The SPC core processes DSP writes to fit within 512 samples, to which it then signals an event to tell the dsp mixer thread to begin processing. If I'm correct, that's around 62.5 times per second this swapping happens to make 32000Hz, the playback speed. Which means around 2.5 times within 60 frames, the SPC core swaps the buffers twice. Also have to take into account that the number of samples played back from the NDSP thread is also a factor.