Closed pdrome closed 6 years ago
VI_V_CURRENT_LINE is updated by the emulator. What do you mean by "roll over"? And why do you believe the expected behaviour is as you wrote?
By roll-over, I mean when switching interlaced fields. So reaching the last half-line of the current field and starting the first half-line of the next field.
I probably shouldn't have called it "expected behavior" and I apologize.
I've modified the TIMINGNTSC ROM to report things like the VI_V_CURRENT_LINE value and the value of the count registry. When I run that code, the "expected behavior" is what is reported by parallel n64. I should have mentioned that earlier. However, it might not be right either because those numbers imply 526 half-lines instead of NTSC's 525. Furthermore, cen64 also has an entirely different behavior and reports the same numbers even when switching interlaced fields.
So I'm not confident that is the expected behavior, it's just a guess. I was actually hoping someone might know the true expected behavior because I don't have the ability to run the modified code on RHW at the moment.
However, I do suspect the current behavior in mupen might be wrong (thought in what way I can't really confirm without hardware documentation or the ability to run my code on RHW). The timing values for the TIMINGNTSC code are derived from real hardware and they show consistency for comparable MIPS operations. The way the timing code is written, if VI_V_CURRENT_LINE returned the values observed from mupen64p we would not see that consistent timing because it would be possible to measure the number of aggregate operations over 2 VI instead of 1 at times (which is what happens in mupen).
I'm willing to look into this, sorry if I came off abruptly, I'm typing on my phone. Is there any emulator that passes the timing test? I can try to fix this up in m64p if it's wrong, but there is probably nothing to do in the GFX plugin
No worries. I was overly confident until challenged on it. :-)
Once I sat down and tried to defend the assertion I realized how weak my evidence was. I think I need to dig a bit further into this and if possible find a way to test on RHW. I'm still learning the N64 emulation ecosystem and which resources are available.
Seems like I should close this issue here though since it's not related to angrylion. Then maybe reopen it over at mupen64 once I've investigated further and if it proves to be necessary.
There is no emulator that passes the timing test. cen64 definitely does the best because it trends similarly. However, the actual values are still off.
Yes, getting m64p or any emulator to correctly report timing is sort of the white whale of N64 emulation, you might want to start on some easier to solve problem, but I wish you luck on your journey! I'm happy to answer any questions you have about the ecosystem or whatever you may be wondering about
Haha, I know so little about emulation that it seemed like a relatively straight-forward and simple task at first. Perfect for a beginner!
Thanks for the offer. I know so little that I hardly know which questions to ask. However, I am trying to determine how mupen keeps absolute time and runs games at what appears to be the correct speed. I assume it is not emulating any component cycle-by-cycle, but it must be regulating the rate at which it executes code somehow.
Understanding that is the first step to even investigating timing and I have no clue yet. :-)
The N64 keeps timing using a register called CP0_COUNT_REG. It increments at half the clock rate. The clock rate of the N64 is 93.75MHz, so each second, CP0_COUNT_REG is supposed to increment 46875000.
We don't know (no one has checked) how many cycles each instruction takes to execute. So both Project64 and mupen64plus take an average of "2'. (PJ64 calls it "Counter Factor" and m64p calls it "CountPerOp"). So for each instruction, we add 2 to CP0_COUNT_REG. That is generally close enough, but it's not accurate, and that is what leads to so many timing issues. The ADD instruction might actually take 4 cycles, the DMULT instruction might take 7, who knows... Also, CPU cache misses and other things might actually make the ADD instruction take a different amount of time depending on the circumstances, it's quite complex.
Fortunately there are a few things we know. We know that for NTSC, it should cycle through all the scan lines 60 times a second. The N64 generates a Vertical Interrupt (VI) 60 times a second for NTSC. So to keep "real time", each time a VI happens, we check if 1/60th of a second has passed, if it hasn't, we pause the emulator (speed limiter). That is why the games don't play in "fast forward".
So basically, we try to keep CP0_COUNT_REG close enough to keep the games happy, and we limit the speed of the game to 60 FPS (for NTSC, 50 for PAL), to keep the games running at the right frame rate.
So that is basically it, to keep track of the in-game counter, we guess, to keep the game running at the right speed, we just limit it to 50/60 FPS
If someone could create a test ROM(s) that checked how much CP0_COUNT_REG incremented for each instruction and ran it on RHW it would go a long way in improving the timing of the emulators. There are other things at play like cache misses, but just having the timing for each instruction would probably make the timing many times more accurate
Thanks for the detailed description!
The test ROM you are talking about is actually what I am trying to create using PeterLemon's code as a base. :-)
I got the idea from your comments here: mupen64plus/mupen64plus-core#543
It seemed like a relatively simple project at first.
However, I'm just learning MIPS assembly and it's been a rather slow process. My current attempt doesn't appear to be quite right yet because it outputs garbage for some ops and I'm not sure the results for other ops make sense. However, I'm running entirely within emulators and the result might not be garbage on a real N64.
I'm planning to order a 64drive in the near future to test it out.
@loganmc10
One further question based on your previous timing description regarding when a VI happens. So we know that the N64 will output 60 VI/s and we know the CPU count should increment CP0_COUNT_REG 46,875,000 times per second.
Does this mean mupen is essentially deciding it needs a VI every 46,875,000 / 60 = 781,250 increments of CP0_COUNT_REG? Thanks!
Yes exactly. Project64 has a setting called "ViRefresh" and mupen64plus used to have one called "CountPerScanline", and they default to 1500. So 1500 * 525 = 787,500, which is generally close enough.
For PAL you want 46,875,000 / 50 = 937,500. 1500 * 625 = 937,500, so in this case it is spot on.
Unfortunately it is very finicky, and just getting those numbers right doesn't seem to work right in each case. If you try the PAL version of Pokemon Puzzle League in Project64 you'll notice some audio stuttering. It's very hard to get the audio/video to work properly together with such guess-work timing.
In mupen64plus, we no longer use "CountPerScanline", we do this: https://github.com/mupen64plus/mupen64plus-core/blob/master/src/device/rcp/vi/vi_controller.c#L132-L133
so "CountPerScanline" is calculated as "((VI Clockrate) / (VI refresh rate)) / (Number of scanlines)"
For NTSC that would be ((48681812 / 60) / 525) = ~1545. So this actually works out increment CP0_COUNT_REG around 811363. I agree that it seems like it should be 781,250, but using that formula gives us better audio/video sync and seems to cause fewer bugs. Perhaps if we emulated CP0_COUNT_REG more accurately we wouldn't need to do it that way.
You can see the defined VI clock rates here: https://github.com/libcpu/libcpu/blob/master/test/mips/interpreter/rcp.h#L570-L575
You can tell we still don't have this exactly right if you try "Hey You Pikachu", the audio stutters a lot.
Hopefully this helps you understand how VI_V_CURRENT_LINE is calculated by the emulator.
If the game/ROM requests the value of VI_V_CURRENT_LINE, we look at how much CP0_COUNT_REG has incremented since the last VI. Lets say it has incremented by 155433, if we have defined "CountPerScanline" as 1545, we just divide: 155433 / 1545 = ~100, so we'll set VI_V_CURRENT_LINE to 100 (or maybe it's 525-100, I can't remember). That is how we emulate "which scanline is currently being worked on".
For interlacing, we just alternate the last bit, so for VI # 1, we make sure we return an even number, for VI # 2, we make sure we return an odd number, etc..
Hopefully that makes sense
@loganmc10
Thanks for the comprehensive write up. It all makes perfect sense. I understand mupen a little better now. :-)
It's interesting that the 'CountPerScanline' derived from the VI Clockrate rather than the R4300 Clockrate generally functions better even though it is still using the CP0_COUNT_REG for timing. Of course, it could be a lucky coincidence since mupen specifies a constant value for CountPerOp in the first place.
Are you aware if anyone has ever done any profiling of the MIPS operations most commonly used by N64 games? I'm sure it varies game-to-game, but knowing which operations to prioritize would be valuable.
I'm not aware of anyone that has done any profiling or trying to find out how many cycles each operation takes. I'm also not aware if any documentation exists
I was having an issue getting consistent timing as expected from mupen64plus (m64p - March 26, 2018 Release) with Angrylion when running with PeterLemon's NTSC timing test ROM located here:
https://github.com/PeterLemon/N64/tree/master/CPUTest/CPU/TIMINGNTSC
Related discussion:
mupen64plus/mupen64plus-core#543
I believe I've tracked this down to the reported value of VI_V_CURRENT_LINE (which I believe comes from angrylion, but correct me if I'm wrong) when rolling over from even-to-odd and odd-to-even interlaced fields. Rolling over from an even-to-odd field my half-line numbers reported by VI_V_CURRENT_LINE are:
208, 20a, 20c, 0, 1, 3, 5
Going from an odd-to-even field my half-line numbers are reported as:
209 20b, 20d, 1, 0, 2, 4
I believe that the expected behavior would be:
even-to-odd: 208, 20a, 20c, 1, 3, 5 odd-to-even: 209 20b, 20d, 0, 2, 4