AkBKukU / HydroThunder-TimeTool

Tool for exporting and importing high score times from Hydro Thunder Arcade
MIT License
15 stars 3 forks source link

High score times: architecture specific float interpretation and sensitivity to twice rounding bugs #5

Open WizardTim opened 1 year ago

WizardTim commented 1 year ago

Hydro Thunder handles track times and their high scores as seconds stored in a single precision float, this number is then converted to the a "mm:ss.SS" string when displayed in the UI. However those times exhibit inconsistent rounding errors on the centiseconds position, this cases differences between times displayed by this Python tool and the game. This is likely due to either hardware floating point implementation differences or an error in the implementation of the xmath_ConvertAFloatSecs2NumOfMinSecCSec function in Hydro Thunder. Below is an example of this function being used to obtain the time string (uncompiled source leaked from Dreamcast).

xmath_ConvertAFloatSecs2NumOfMinSecCSec( pHiScore->fTime, &nMin, &nSecs, &nCSecs );
XMATH_CLAMPMAX( nMin, 99 );
XMATH_CLAMPMAX( nSecs, 59 );
XMATH_CLAMPMAX( nCSecs, 99 );
text_PrintF( fX, fY, "%02d:%02d.%02d\n", nMin, nSecs, nCSecs );

TODO

AkBKukU commented 1 year ago

I will try to get some samples dedicated to this problem. This is the last "real" problem for allowing this tool to import data, fully process it, and output identical results. I will not be surprised if this isn't solvable though, the rounding I observed when I first noticed this did not see to be based on the actual values of the floats. But a pattern may be discoverable with enough samples.

WizardTim commented 1 year ago

The Python script is confirmed interpreting the single precision floats correctly, it's Hydro Thunder that's drunk rounding things wrong due to the use of pre-rounded times, seconds to centiseconds multiplication and the FISTP instruction. If trunc(totalseconds * 100)/100 doesn't give the same behavior I wouldn't bother pursuing it any further.

Below is the reverse engineered pseudocode for the three functions called in the code snippet above.

void _xmath_ConvertAFloatSecs2NumOfMinSecCSec(float fTime,int *nMin,int *nSecs,int *nCSecs) {
  int fTime;

  // Convert time from a single precision float of seconds to a longlong signed int64 as total centiseconds using the FISTP instruction
  // The FPU appears to be set to round down during this instruction if the inexact exception is raised, 
  // because the floats stored in the high score table are pre-rounded this makes the instruction sensitive to round incorrectly,
  // and with the seconds to centiseconds conversion the rounding is again different.
  fTime= __ftol();

  // Get number of full minutes
  *nMin = fTime/ 6000;

  // Remove full number of minutes from remaining time
  if (fTime/ 6000 != 0) {
    fTime= fTime % 6000;
  }

  // Get number of full seconds
  *nSecs = fTime/ 100;

  // Remove full number of seconds from remaining time
  if (fTime/ 100 != 0) {
    fTime= fTime % 100;
  }

  // Remaining number is centiseconds
  *nCSecs = fTime;

  return;
}
void _wpr_hiscore_CreateTimeString(float nMin,char *timeString) {
  char cVar1;
  uint nSecs;
  uint nCSecs;

  if (nMin != 0.0) {
    _xmath_ConvertAFloatSecs2NumOfMinSecCSec(nMin,(int *)&nMin,(int *)&nSecs,(int *)&nCSecs);
    if (99 < (uint)nMin) {
      nMin = 99;
    }
    if (0x3b < nSecs) {
      nSecs = 59;
    }
    if (99 < nCSecs) {
      nCSecs = 99;
    }
    cVar1 = (char)((uint)nMin / 10);
    *timeString = cVar1 + '0';
    timeString[1] = SUB41(nMin,0) + cVar1 * -10 + '0';
    cVar1 = (char)(nSecs / 10);
    timeString[3] = cVar1 + '0';
    timeString[4] = (char)nSecs + cVar1 * -10 + '0';
    cVar1 = (char)(nCSecs / 10);
    timeString[6] = cVar1 + '0';
    timeString[2] = ':';
    timeString[5] = '.';
    timeString[7] = (char)nCSecs + cVar1 * -10 + '0';
    timeString[8] = '\0';
    return;
  }
  _xclib_strcpy(timeString,&s_DID_NOT_FINISH);
  return;
}
WizardTim commented 1 year ago

Previously this bug was only considered a quality-of-life issue however it is now apparent the poor handling of float rounding by Hydro Thunder is very likely causing issues when this tool modifies high score times with differently interpreted and rounded float times in the raw data blocks as discussed in the new issue found in PR #6.

WizardTim commented 1 year ago

TL;DR (mature language)

Summary

The P6 architecture of the old Intel Celeron 333 translates floating point bits to real decimal numbers slightly differently to modern architectures that are generally more accurate. Because of this mismatch of floating point interpretations, the game has different rounding behaviour (by 1 LSB), in addition the game twice rounds its floats (something you should never do) which further results in differences in what the game thinks is a number correctly rounded to the centiseconds position and what the true float value is. This is a major issue because the game appears to read the raw data from the raw data blocks, parse it, including rounding it (nearest), then convert it back to bytes, if it rounds a number differently but thinks it made no change it appears to expect the checksum to remain the same, which it will not because it rounded numbers differently, this causes the game to think the high score data is corrupt so it discards it all and resets to defaults.

The root causes

The game stores high score track times as seconds in a single precision float internally referred to as fTime. Although single precision floats are reliable up to about 7 decimal digits the game rounds the floats to 2 decimal places, this choice was likely made due to a combination of the GUI only displaying centiseconds and not milliseconds but also likely a limitation of the game tick which is 1/60th of a second (≈16 ms) making millisecond precision less important.

Twice rounding errors

While rounding your floats will not cause problems it itself, parts of the game further round the floats a second time (round up/down rather than nearest is the issue), because floats are just approximations of a number further rounding pre-rounded floats that are just under the carry over of your rounding decimal places will result in incorrect rounding behaviour, for example 98.834846 s would be first rounded to nearest, 98.83, however the nearest float is 98.8299942016602, this would then be rounded a second time using the round down of FISTP to 98.82. It is unclear if the developers knew they were rounding their floats twice or if the 2nd rounding occurred inside other functions that were compiled to rely on rounding behaviour (particular the FISTP instruction). In a perfect world the developers of Hydro Thunder would have found this issue and implemented high score times as a uint32 of centiseconds, this way the high score table would not be reliant on float interpretations and the behaviour would have been perfectly deterministic and very well defined. This issue is what is causing centiseconds to be displayed incorrectly on the leaderboards, because of this I suspect it may be impossible to display certain times as they will always be rounded to an incorrect number in the GUI, however all world record times should be possible to display as they have gone through this twice rounded behaviour, although their real time may actually be 10 ms slower (my condolences to the speedrunners).

Architecture specific floating point interpretation differences

The new issue that has been found is that the game will reset when given certain numbers in the high score table, this is most likely because those numbers are impossible for the game to write to the raw data blocks because the game rounds them differently because it is using floating point units on an old P6 architecture Intel Celeron 333 that doesn’t have the same accuracy as a modern CPU to know which exact LSB corresponds to the exactly closest rounded floating point number. For example, you put 01:31.49 in the CSV which you want to write to the high score table. The Python code running on a modern architecture reads this and converts it to the nearest single precision float, which isE1 FA B6 42 and this is correct, but although Python may display this sometimes as 91.49, in reality it is 91.4899978637695. However when the old P6 architecture PC reads this it’s less precise FPU converts this to say 91.4898… and thus round up 1 LSB to what it thinks is closer to 91.49. This also works in reverse, a number that the P6 architecture systems saves to the raw data blocks are sometimes rounded by 1 LSB when near carry over on a modern architecture FPU. Below is a table compiling this behavior using numbers from SN33-Real.img and screen capture from the Twitch streams.

|          | Saved float    | Hydro Thunder's | Float value according  | Nearest decimal to satisfy | Closeness of HYDRO  | Python nearest float to | Diff            |
|      | in table       | mm:ss.SS        | to IEEE-754-2008       | `FISTP` round-down         | to IEEE-754-2008    | HYDRO interpreted float | (Over-Roudned)  |
|----------|----------------|-----------------|------------------------|----------------------------|---------------------|-------------------------|-----------------|
| TSH      | F5 A8 C5 42    | 01:38.82        | 98.8299942016602       | 98.829...                  |     NEAR CARRY      | F6 A8 C5 42             | 01 00 00 00     |
| TSH      | 85 EB C5 42    | 01:38.96        | 98.9599990844727       | 98.960...                  |      MISMATCH       | 85 EB C5 42             | 00 00 00 00     |
| TSH      | 29 DC C6 42    | 01:39.43        | 99.4300003051758       | 99.430...                  |    CLOSE EXACT      | 29 DC C6 42             | 00 00 00 00     |
| UX6      | 1E 85 C7 42    | 01:39.75        | 99.7599945068359       | 99.759...                  |     NEAR CARRY      | 1F 85 C7 42             | 01 00 00 00     |
| TSH      | F5 A8 C7 42    | 01:39.82        | 99.8299942016602       | 99.829...                  |     NEAR CARRY      | F6 A8 C7 42             | 01 00 00 00     |
| TSH      | B8 1E C8 42    | 01:40.06        | 100.059997558594       | 100.060...                 |      MISMATCH       | B8 1E C8 42             | 00 00 00 00     |
| TSH      | 1E 85 C8 42    | 01:40.25        | 100.259994506836       | 100.259...                 |     NEAR CARRY      | 1F 85 C8 42             | 01 00 00 00     |
| Y6R      | 29 DC C8 42    | 01:40.43        | 100.430000305176       | 100.430...                 |    CLOSE EXACT      | 29 DC C8 42             | 00 00 00 00     |
| TX3      | B8 1E C9 42    | 01:40.56        | 100.559997558594       | 100.560...                 |      MISMATCH       | B8 1E C9 42             | 00 00 00 00     |
| TSH      | 7B 94 C9 42    | 01:40.79        | 100.790000915527       | 100.790...                 |    CLOSE EXACT      | 7B 94 C9 42             | 00 00 00 00     |
|Known Bad |                |                 | 91.4899978637695       |                            |  LIKELEY MISMATCH   | E1 FA B6 42             |                 |

Why does adding/subtracting LSBs to the checksum or ASCII initials cause things to work?

If the game rounds the float in RAM a couple things can happen to influence the checksum. This also explains the +2 to the an ASCII char in the initials, by adding two you're effectively biasing the checksum so that when it rounds the float and recalculates the checksum, the extra bits from the ASCII char make the checksum valid.

Is this an intentional security feature to prevent manipulating high scores?

Possibly but very unlikely, I can see it checking numbers are exact bit perfect rounded numbers being a very basic security feature however it is much more likely just a symptom of them rounding their floats for other reasons. Also, the float interpretation mismatch with future architectures is such an obscure thing to rely on I highly doubt that aspect was intention.

Recommended troubleshooting tests to confirm this hypothesis

Should always work

Those tests confirm the checksum is 100% correct and we haven’t missed anything

Possible solutions

AkBKukU commented 1 year ago

This is a lot of great research!

Some initial thoughts

One of my takeaways from this is that people who may have replaced the 333MHz Celeron with a faster Pentium may have altered this behavior which muddies things for speed runners even more. Being a slot one system it may be possible to radically change the architecture of the original machine by putting in a Pentium 3.

Another takeaway is that my plan to attempt building a 3rd viable computer for the game with outside parts may be beneficial for brute force testing as long as I can use a similar Celeron. I'll elaborate more on this as I make progress, but I may have everything needed to fully remote control the whole computer(inputs, video, storage, etc). The bigger problems will come from the Diego interface board. It may be time to start capturing some logs of the serial interface to begin understanding and...sigh...reversing it a bit to potentially.

A Potential Issue

The thing that bothers me, and I need to get some raw block images for this, was what I mentioned here: https://github.com/AkBKukU/HydroThunder-TimeTool/pull/6#issuecomment-1398799631 I got the file to be "stable" with some world record times until I changed an initial which shouldn't involve floating point issues. I'm not ruling out the floats causing the issue, I'm just trying to lot let that point get lost.

Next steps for data collection

I think one quick low effort thing I can do that will be helpful is boot up all used hard drive images I can get and video capture the high scores being shown so we can begin to map how as many values as possible are interpreted. It's a shame I cannot get specific times in the game by playing well, centiseconds are too tight.

I should be able to run those three tests you have on the game to see how it responds. I'm also going to add a forth test with artificial times to see how it reacts

Continuing work

I'm going to keep at getting data and testing on my end while looking into setting up a testing solution to more rapidly test changes. One potential option I'm thinking to avoid some of the pain of this testing, will be to attempt booting the game from a SCSI2SD which allows access to the SD card used as the storage device both over the SCSI interface as well as over USB to a host computer. So it may be possible to more rapidly change data and reset the machine to test. I'm not sure if it is possible to change data live, I've never tested that but will look into it. If it can at least be read live it may be possible to log a lot of sample data quickly by racing and taking snapshots in real time.

I have a feeling this one isn't going to be cracked easily which is why I planning for longer term testing methods. Hopefully I'm being pessimistic.

WizardTim commented 1 year ago

Oh people are allowed to swap the CPU in an arcade machine for speedrunning? Was not expecting that. However the double rounding issue would still affect them, maybe if there's some high end Pentium with newer FPUs there would be slightly fewer times that get rounded down wrong but most of them still would, those instructions would still cause rounding errors on a 2023 era CPU.

If the arcade version of the game is ever to be emulated the Diego PCB and the EEPROM's content, MCU's code, DAC, ADC, button IO, switches, lights and what looks like an NI GAL have to be reverse engineered. I can see that being a big, time consuming and expensive project by at least an order of magnitude compared to those high scores.

Sorry I should have been more clear, I've updated the section about that to Why does adding/subtracting LSBs to the checksum or ASCII initials cause things to work? and added a sentence addressing the changing of chars making it valid. Unless you mean you had a working image then changed a char and updated the checksum and it failed, in which case the checksum is broken, but only if that's a confirmed repeatable issue would I be concerned because everything else says that shouldn't happen.

I'd say hold off on screen capping the high scores yet (side note I think the in race timer has the same rounding bug anyway), but just test to see if those tests fail or work and then we know if this is actually the problem. I suspect the LUT approach will be easy and reliable enough to implement in which case we won't need any screen captures, they won't give us the raw bytes anyway.

That SCSI2SD sounds cool, hopefully SCSI plays nice and the speeds some of the testing up. Real time access to the raw data region would be very helpful but I doubt that adapter would allow it, screams race condition just waiting to destroy data.

AkBKukU commented 1 year ago

Unless you mean you had a working image then changed a char and updated the checksum and it failed, in which case the checksum is broken, but only if that's a confirmed repeatable issue would I be concerned because everything else says that shouldn't happen.

That is exactly what I mean. I put in all of the of the high scores and initials from Twin Galaxies up to Thunder Park, then changed the initials one by one for Thunder Park, I found out just changing the R in SER to B to match what I was typing in before (Like I showed on stream) would cause the checksum to fail. I'll try to get a demonstration of this, which I just checked and I did not show it on stream I think, when I'm in the office next so you can see what I mean.

AkBKukU commented 1 year ago

I've recorded a demonstration and collected data samples of the checksum being invalid with just a change of an initial: https://youtu.be/j61NUPbL9ho

I'm attaching a zip with these files test-case.zip:

I don't have time to work on this today, but a future test may be to try to find the minimum number of changes needed to cause the +2 LSB offset to be required to have a valid checksum. This was just a single test where I found the issue for the first time.

WizardTim commented 1 year ago

I was just about to post this comment and you posted yours, so interpret it without the context of knowing there's now a very clear case of it failing on a single char change

I’m just going to plead ignorance and ignore that failure case, after all when you did it on stream it worked and accepted the name change, it was only times that caused resets. I’m sure if I believe hard enough you just goofed and like overwrote the next byte of the time or something somehow this won’t become a real issue.

Race timer partially reverse engineered

I’ve been looking into how the race time is calculated, turns out they’re just incrementing a number by a constant fraction of a second for the 60 fps frame rate with _Player_fElapsedSecs = _Gameloop_fTargetFrameTime + _Player_fElapsedSecs;. All the system tick timer stuff I thought was keeping track of it before seems to be for the audit data race time and other in game animations. In hindsight this is obvious when looking at world record leaderboards, most end in 0, 3, 6 or 9 plus looking at screen capture from your streams and from other speedrunners there is a finite incrementing of the timer which is the 0.03333… second increments. However, because they are incrementing a non-exact float with another non-exact float our good friend floating point error comes into ruin everything. Why they didn’t just implement this as a centiseconds integer or number of frames I have no clue.

LUT program

HYDROLUT.zip HYDROLUT Results.zip

This afternoon I wrote that LUT program in C (also simulates the race timer behaviour), it really should be implemented in assembly with the exact opcodes copy-pasted from HYDRO.EXE but I really don’t want to do that and I don’t really know how, but C should be somewhat close enough and it looks like I got the compiler to use FISTP anyway for the float to integer conversion for the GUI mm:ss.SS. I’ve compiled those as several executables for Windows NT platform with different optimisation flags to test on the Celeron 333 with Windows NT/9x. You can run it with the usual HYDROLUT_FPC.EXE > HYDROLUT_FPC.CSV to write it to a file. I was able to test it first on my old NEC Versa 2760 MT and I found it incorrectly rounded a couple times, but was kinda expecting a lot more to just be off by 1 bit, it is an older P5 architecture CPU so should be somewhat equivalent. And in the event the program actually works and solves this whole problem I recorded testing it on the laptop just in case you need some alternate B roll whilst you explain this hot mess (I’ve got other angles and the 100 mbps files).

Actual Race Frame Time C Rounding (all others) C Rounding (-fpc) Python 3.10 (amd64)
Float Value 166.739990234375000 166.729995727539063 166.740005493164063 166.7400054931640625
Float Hex 0x4326bd70 0x4326bae1 0x4326bd71 0x4326bd71

However, this program says 0x42c8851f is a valid time, it does not generate 0x42c8851e which has been found in the high scores table, so I don’t know what else the game is doing to get it off by one, it may very well be something unrelated to this float issue that affects the GUI. But for now I think I’ll wait until after you work on this in the next Hydro Thunder stream to do more work on this as this hypothesis isn’t working out so well, but I’m hoping that Celeron just has a complete trash FPU for some reason and those programs can find it but I think the odds are very slim. That brute force with the test bench and SD2SCSI + Raspberry Pi and some sort of screen reader might be the only viable option but to brute force the validity of just 80 s to 150 s would probably take one or several weeks in restart times, either way I’m not sure how much more I can reverse engineer without either running out of easily readable decompiled code or time, so hopefully it can be solved without any of that.

Edit: had a look at the new .img files, have no clue what it's doing to cause this.

WizardTim commented 1 year ago

Some finding from emulating the game patched for custom Voodoo and Diego PCB check with PCem with Intel Celeron 333 and Voodoo 2 8 MB (on AMD 3950X & RTX3070)