High score times: architecture specific float interpretation and sensitivity to twice rounding bugs

Hydro Thunder handles track times and their high scores as seconds stored in a single precision float, this number is then converted to the a "mm:ss.SS" string when displayed in the UI. However those times exhibit inconsistent rounding errors on the centiseconds position, this cases differences between times displayed by this Python tool and the game. This is likely due to either hardware floating point implementation differences or an error in the implementation of the xmath_ConvertAFloatSecs2NumOfMinSecCSec function in Hydro Thunder. Below is an example of this function being used to obtain the time string (uncompiled source leaked from Dreamcast).

xmath_ConvertAFloatSecs2NumOfMinSecCSec( pHiScore->fTime, &nMin, &nSecs, &nCSecs );
XMATH_CLAMPMAX( nMin, 99 );
XMATH_CLAMPMAX( nSecs, 59 );
XMATH_CLAMPMAX( nCSecs, 99 );
text_PrintF( fX, fY, "%02d:%02d.%02d\n", nMin, nSecs, nCSecs );

TODO

[ ] Obtain uncomplied source of xmath_ConvertAFloatSecs2NumOfMinSecCSec or
[X] Reverse engineer decompiled xmath_ConvertAFloatSecs2NumOfMinSecCSec from HYRO.EXE (1.00d) at 001919D0h or
[ ] Guess implementation error, eg. rounding at microseconds and discarding least significant figure to obtain centiseconds

I will try to get some samples dedicated to this problem. This is the last "real" problem for allowing this tool to import data, fully process it, and output identical results. I will not be surprised if this isn't solvable though, the rounding I observed when I first noticed this did not see to be based on the actual values of the floats. But a pattern may be discoverable with enough samples.

The Python script is confirmed interpreting the single precision floats correctly, it's Hydro Thunder that's drunk rounding things wrong due to the use of pre-rounded times, seconds to centiseconds multiplication and the FISTP instruction. If trunc(totalseconds * 100)/100 doesn't give the same behavior I wouldn't bother pursuing it any further.

Below is the reverse engineered pseudocode for the three functions called in the code snippet above.

void _xmath_ConvertAFloatSecs2NumOfMinSecCSec(float fTime,int *nMin,int *nSecs,int *nCSecs) {
  int fTime;

  // Convert time from a single precision float of seconds to a longlong signed int64 as total centiseconds using the FISTP instruction
  // The FPU appears to be set to round down during this instruction if the inexact exception is raised, 
  // because the floats stored in the high score table are pre-rounded this makes the instruction sensitive to round incorrectly,
  // and with the seconds to centiseconds conversion the rounding is again different.
  fTime= __ftol();

  // Get number of full minutes
  *nMin = fTime/ 6000;

  // Remove full number of minutes from remaining time
  if (fTime/ 6000 != 0) {
    fTime= fTime % 6000;
  }

  // Get number of full seconds
  *nSecs = fTime/ 100;

  // Remove full number of seconds from remaining time
  if (fTime/ 100 != 0) {
    fTime= fTime % 100;
  }

  // Remaining number is centiseconds
  *nCSecs = fTime;

  return;
}

void _wpr_hiscore_CreateTimeString(float nMin,char *timeString) {
  char cVar1;
  uint nSecs;
  uint nCSecs;

  if (nMin != 0.0) {
    _xmath_ConvertAFloatSecs2NumOfMinSecCSec(nMin,(int *)&nMin,(int *)&nSecs,(int *)&nCSecs);
    if (99 < (uint)nMin) {
      nMin = 99;
    }
    if (0x3b < nSecs) {
      nSecs = 59;
    }
    if (99 < nCSecs) {
      nCSecs = 99;
    }
    cVar1 = (char)((uint)nMin / 10);
    *timeString = cVar1 + '0';
    timeString[1] = SUB41(nMin,0) + cVar1 * -10 + '0';
    cVar1 = (char)(nSecs / 10);
    timeString[3] = cVar1 + '0';
    timeString[4] = (char)nSecs + cVar1 * -10 + '0';
    cVar1 = (char)(nCSecs / 10);
    timeString[6] = cVar1 + '0';
    timeString[2] = ':';
    timeString[5] = '.';
    timeString[7] = (char)nCSecs + cVar1 * -10 + '0';
    timeString[8] = '\0';
    return;
  }
  _xclib_strcpy(timeString,&s_DID_NOT_FINISH);
  return;
}

Previously this bug was only considered a quality-of-life issue however it is now apparent the poor handling of float rounding by Hydro Thunder is very likely causing issues when this tool modifies high score times with differently interpreted and rounded float times in the raw data blocks as discussed in the new issue found in PR #6.

TL;DR (mature language)

Summary

The P6 architecture of the old Intel Celeron 333 translates floating point bits to real decimal numbers slightly differently to modern architectures that are generally more accurate. Because of this mismatch of floating point interpretations, the game has different rounding behaviour (by 1 LSB), in addition the game twice rounds its floats (something you should never do) which further results in differences in what the game thinks is a number correctly rounded to the centiseconds position and what the true float value is. This is a major issue because the game appears to read the raw data from the raw data blocks, parse it, including rounding it (nearest), then convert it back to bytes, if it rounds a number differently but thinks it made no change it appears to expect the checksum to remain the same, which it will not because it rounded numbers differently, this causes the game to think the high score data is corrupt so it discards it all and resets to defaults.

The root causes

The game stores high score track times as seconds in a single precision float internally referred to as fTime. Although single precision floats are reliable up to about 7 decimal digits the game rounds the floats to 2 decimal places, this choice was likely made due to a combination of the GUI only displaying centiseconds and not milliseconds but also likely a limitation of the game tick which is 1/60th of a second (≈16 ms) making millisecond precision less important.

Twice rounding errors

While rounding your floats will not cause problems it itself, parts of the game further round the floats a second time (round up/down rather than nearest is the issue), because floats are just approximations of a number further rounding pre-rounded floats that are just under the carry over of your rounding decimal places will result in incorrect rounding behaviour, for example 98.834846 s would be first rounded to nearest, 98.83, however the nearest float is 98.8299942016602, this would then be rounded a second time using the round down of FISTP to 98.82. It is unclear if the developers knew they were rounding their floats twice or if the 2nd rounding occurred inside other functions that were compiled to rely on rounding behaviour (particular the FISTP instruction). In a perfect world the developers of Hydro Thunder would have found this issue and implemented high score times as a uint32 of centiseconds, this way the high score table would not be reliant on float interpretations and the behaviour would have been perfectly deterministic and very well defined. This issue is what is causing centiseconds to be displayed incorrectly on the leaderboards, because of this I suspect it may be impossible to display certain times as they will always be rounded to an incorrect number in the GUI, however all world record times should be possible to display as they have gone through this twice rounded behaviour, although their real time may actually be 10 ms slower (my condolences to the speedrunners).

Architecture specific floating point interpretation differences

The new issue that has been found is that the game will reset when given certain numbers in the high score table, this is most likely because those numbers are impossible for the game to write to the raw data blocks because the game rounds them differently because it is using floating point units on an old P6 architecture Intel Celeron 333 that doesn’t have the same accuracy as a modern CPU to know which exact LSB corresponds to the exactly closest rounded floating point number. For example, you put 01:31.49 in the CSV which you want to write to the high score table. The Python code running on a modern architecture reads this and converts it to the nearest single precision float, which isE1 FA B6 42 and this is correct, but although Python may display this sometimes as 91.49, in reality it is 91.4899978637695. However when the old P6 architecture PC reads this it’s less precise FPU converts this to say 91.4898… and thus round up 1 LSB to what it thinks is closer to 91.49. This also works in reverse, a number that the P6 architecture systems saves to the raw data blocks are sometimes rounded by 1 LSB when near carry over on a modern architecture FPU. Below is a table compiling this behavior using numbers from SN33-Real.img and screen capture from the Twitch streams.

|          | Saved float    | Hydro Thunder's | Float value according  | Nearest decimal to satisfy | Closeness of HYDRO  | Python nearest float to | Diff            |
|      | in table       | mm:ss.SS        | to IEEE-754-2008       | `FISTP` round-down         | to IEEE-754-2008    | HYDRO interpreted float | (Over-Roudned)  |
|----------|----------------|-----------------|------------------------|----------------------------|---------------------|-------------------------|-----------------|
| TSH      | F5 A8 C5 42    | 01:38.82        | 98.8299942016602       | 98.829...                  |     NEAR CARRY      | F6 A8 C5 42             | 01 00 00 00     |
| TSH      | 85 EB C5 42    | 01:38.96        | 98.9599990844727       | 98.960...                  |      MISMATCH       | 85 EB C5 42             | 00 00 00 00     |
| TSH      | 29 DC C6 42    | 01:39.43        | 99.4300003051758       | 99.430...                  |    CLOSE EXACT      | 29 DC C6 42             | 00 00 00 00     |
| UX6      | 1E 85 C7 42    | 01:39.75        | 99.7599945068359       | 99.759...                  |     NEAR CARRY      | 1F 85 C7 42             | 01 00 00 00     |
| TSH      | F5 A8 C7 42    | 01:39.82        | 99.8299942016602       | 99.829...                  |     NEAR CARRY      | F6 A8 C7 42             | 01 00 00 00     |
| TSH      | B8 1E C8 42    | 01:40.06        | 100.059997558594       | 100.060...                 |      MISMATCH       | B8 1E C8 42             | 00 00 00 00     |
| TSH      | 1E 85 C8 42    | 01:40.25        | 100.259994506836       | 100.259...                 |     NEAR CARRY      | 1F 85 C8 42             | 01 00 00 00     |
| Y6R      | 29 DC C8 42    | 01:40.43        | 100.430000305176       | 100.430...                 |    CLOSE EXACT      | 29 DC C8 42             | 00 00 00 00     |
| TX3      | B8 1E C9 42    | 01:40.56        | 100.559997558594       | 100.560...                 |      MISMATCH       | B8 1E C9 42             | 00 00 00 00     |
| TSH      | 7B 94 C9 42    | 01:40.79        | 100.790000915527       | 100.790...                 |    CLOSE EXACT      | 7B 94 C9 42             | 00 00 00 00     |
|Known Bad |                |                 | 91.4899978637695       |                            |  LIKELEY MISMATCH   | E1 FA B6 42             |                 |

Why does adding/subtracting LSBs to the checksum or ASCII initials cause things to work?

If the game rounds the float in RAM a couple things can happen to influence the checksum. This also explains the +2 to the an ASCII char in the initials, by adding two you're effectively biasing the checksum so that when it rounds the float and recalculates the checksum, the extra bits from the ASCII char make the checksum valid.

SUM32 result decrements or increments by LSB but without overflowing but it is replaced by the parity bit so it is hidden only the parity bit will be visible, in this case the checksum will be biased by either -1 or +1.
SUM32 result increments or decrements causing a carry over on the 2nd LSB, this is visible in the checksum, in addition the parity bit may be set or unset so the checksum will be biased by -3, -2, -1 or +1, +2 or +3. This and the fact testing numbers and valid checksums must be done one by one and involves booting the game a brute force solution to this is infeasible.

Is this an intentional security feature to prevent manipulating high scores?

Possibly but very unlikely, I can see it checking numbers are exact bit perfect rounded numbers being a very basic security feature however it is much more likely just a symptom of them rounding their floats for other reasons. Also, the float interpretation mismatch with future architectures is such an obscure thing to rely on I highly doubt that aspect was intention.

Recommended troubleshooting tests to confirm this hypothesis

Should always work

Those tests confirm the checksum is 100% correct and we haven’t missed anything

Swap default and game generated high score floats around in the table
Replace some high scores with other known good times duplicating them which will change the SUM32 and odd/even parity
Should work most of the time but sometimes fail

This is already tested; we know it fails when we get to Thunder Park WR time
Replace high scores with pre-rounded floats from world records
Replacing one high score time with a rounded time from a modern computer may cause it to fail and reset but not always
Should never work and always reset the game to defaults
Add or subtract 1 LSB (which is the LSB of the mantissa) from known good times
Replace one high score time with a not even slightly rounded time >This is the best test to do first<

Possible solutions

Bias floats generated on a modern computer that are “NEAR CARRY” with either +1 or -1 LSB, the exact region of decimal digits that are “NEAR CARRY” must be experimentally determined for every architecture the script is run on, this however will never work 100% of the time with edge cases and the log nature of floats as seconds increase but it would still be a good proof of concept before pursuing the LUT solution.
LUT approach: Generate a table of nearest rounded floats on a P6 architecture CPU (preferable an Intel Celeron 333), for times up to 300 s this will be a LUT with 30,000 1:1 entries (15 kB / 30 kB), you’d then map in python the generated uint or float to the nearest P6 rounded float, this approach should work 100% of the time including on older/future CPU architectures.
I’ve experienced this float interpretation mismatch issue once before with a MATLAB script I wrote and then ran on an Intel Sandy Bridge which worked and the later an AMD Bulldozer which failed, however in that case I just modified my MATLAB script, which is something we could do with Hydro Thunder patching HYDRO.EXE to either not round floats to start with or to recalculate the checksum when floats are rounded, however it is difficult to know what unintended consequences that would have and would likely be the most difficult to implement solution requiring at least level 99 wizardry for a decent success rate.

This is a lot of great research!

Some initial thoughts

One of my takeaways from this is that people who may have replaced the 333MHz Celeron with a faster Pentium may have altered this behavior which muddies things for speed runners even more. Being a slot one system it may be possible to radically change the architecture of the original machine by putting in a Pentium 3.

Another takeaway is that my plan to attempt building a 3rd viable computer for the game with outside parts may be beneficial for brute force testing as long as I can use a similar Celeron. I'll elaborate more on this as I make progress, but I may have everything needed to fully remote control the whole computer(inputs, video, storage, etc). The bigger problems will come from the Diego interface board. It may be time to start capturing some logs of the serial interface to begin understanding and...sigh...reversing it a bit to potentially.

A Potential Issue

The thing that bothers me, and I need to get some raw block images for this, was what I mentioned here: https://github.com/AkBKukU/HydroThunder-TimeTool/pull/6#issuecomment-1398799631 I got the file to be "stable" with some world record times until I changed an initial which shouldn't involve floating point issues. I'm not ruling out the floats causing the issue, I'm just trying to lot let that point get lost.

Next steps for data collection

I think one quick low effort thing I can do that will be helpful is boot up all used hard drive images I can get and video capture the high scores being shown so we can begin to map how as many values as possible are interpreted. It's a shame I cannot get specific times in the game by playing well, centiseconds are too tight.

I should be able to run those three tests you have on the game to see how it responds. I'm also going to add a forth test with artificial times to see how it reacts

Round failing tests to deciseconds to make the game work with a consistent centisecond value
Put different times into the game, run a race so it must load and work with the floats and then save. Then see how they have been modified.

Continuing work

I'm going to keep at getting data and testing on my end while looking into setting up a testing solution to more rapidly test changes. One potential option I'm thinking to avoid some of the pain of this testing, will be to attempt booting the game from a SCSI2SD which allows access to the SD card used as the storage device both over the SCSI interface as well as over USB to a host computer. So it may be possible to more rapidly change data and reset the machine to test. I'm not sure if it is possible to change data live, I've never tested that but will look into it. If it can at least be read live it may be possible to log a lot of sample data quickly by racing and taking snapshots in real time.

I have a feeling this one isn't going to be cracked easily which is why I planning for longer term testing methods. Hopefully I'm being pessimistic.

Oh people are allowed to swap the CPU in an arcade machine for speedrunning? Was not expecting that. However the double rounding issue would still affect them, maybe if there's some high end Pentium with newer FPUs there would be slightly fewer times that get rounded down wrong but most of them still would, those instructions would still cause rounding errors on a 2023 era CPU.

If the arcade version of the game is ever to be emulated the Diego PCB and the EEPROM's content, MCU's code, DAC, ADC, button IO, switches, lights and what looks like an NI GAL have to be reverse engineered. I can see that being a big, time consuming and expensive project by at least an order of magnitude compared to those high scores.

Sorry I should have been more clear, I've updated the section about that to Why does adding/subtracting LSBs to the checksum or ASCII initials cause things to work? and added a sentence addressing the changing of chars making it valid. Unless you mean you had a working image then changed a char and updated the checksum and it failed, in which case the checksum is broken, but only if that's a confirmed repeatable issue would I be concerned because everything else says that shouldn't happen.

I'd say hold off on screen capping the high scores yet (side note I think the in race timer has the same rounding bug anyway), but just test to see if those tests fail or work and then we know if this is actually the problem. I suspect the LUT approach will be easy and reliable enough to implement in which case we won't need any screen captures, they won't give us the raw bytes anyway.

That SCSI2SD sounds cool, hopefully SCSI plays nice and the speeds some of the testing up. Real time access to the raw data region would be very helpful but I doubt that adapter would allow it, screams race condition just waiting to destroy data.

Unless you mean you had a working image then changed a char and updated the checksum and it failed, in which case the checksum is broken, but only if that's a confirmed repeatable issue would I be concerned because everything else says that shouldn't happen.

That is exactly what I mean. I put in all of the of the high scores and initials from Twin Galaxies up to Thunder Park, then changed the initials one by one for Thunder Park, I found out just changing the R in SER to B to match what I was typing in before (Like I showed on stream) would cause the checksum to fail. I'll try to get a demonstration of this, which I just checked and I did not show it on stream I think, when I'm in the office next so you can see what I mean.

I've recorded a demonstration and collected data samples of the checksum being invalid with just a change of an initial: https://youtu.be/j61NUPbL9ho

I'm attaching a zip with these files test-case.zip:

free-0.img (Checksum Valid) : Base image that is the stock times with free play enabled to easily test checksum validity
times-good.csv : All times an initials from Twin Galaxies leaderboard added before Thunder Park. (I just noticed Ship Graveyard does not have the time set but this is the data I was working with still)
times-bad.csv : Same as times-good.csv but the top score for Thunder Park has had the initials changed from SER to SEB
free-0-good.img (Checksum Valid) : Raw data of free-0.img modified by adding times-good.csv with the following command: python3 ht-time.py -r free-0.img --times times-good.csv --write_raw free-0-good.img
free-0-bad.img (Checksum Invalid) : Raw data of free-0.img modified by adding times-bad.csv with the following command: python3 ht-time.py -r free-0.img --times times-bad.csv --write_raw free-0-bad.img
free-0-bad-2lsb.img (Checksum Valid) : Raw data of free-0.img modified by adding times-bad.csv and increasing the value of the checksum LSB by two with the following command: python3 ht-time.py -r free-0.img --times times-bad.csv --write_raw free-0-bad.img --lsb_offset 2
loaded-floats-0.img & loaded-floats-1.img (Checksums Valid) : These were read back from the card after booting the game with free-0-bad-2lsb.img. These still have a checksum LSB offset of 2 after they were read back but were valid for the game. They are only different from each other because of how the game writes audit data. The floats for the modified times did not seem to change like I thought they would.

I don't have time to work on this today, but a future test may be to try to find the minimum number of changes needed to cause the +2 LSB offset to be required to have a valid checksum. This was just a single test where I found the issue for the first time.

I was just about to post this comment and you posted yours, so interpret it without the context of knowing there's now a very clear case of it failing on a single char change

I’m just going to plead ignorance and ignore that failure case, after all when you did it on stream it worked and accepted the name change, it was only times that caused resets. I’m sure if I believe hard enough you just goofed and like overwrote the next byte of the time or something somehow this won’t become a real issue.

Race timer partially reverse engineered

I’ve been looking into how the race time is calculated, turns out they’re just incrementing a number by a constant fraction of a second for the 60 fps frame rate with _Player_fElapsedSecs = _Gameloop_fTargetFrameTime + _Player_fElapsedSecs;. All the system tick timer stuff I thought was keeping track of it before seems to be for the audit data race time and other in game animations. In hindsight this is obvious when looking at world record leaderboards, most end in 0, 3, 6 or 9 plus looking at screen capture from your streams and from other speedrunners there is a finite incrementing of the timer which is the 0.03333… second increments. However, because they are incrementing a non-exact float with another non-exact float our good friend floating point error comes into ruin everything. Why they didn’t just implement this as a centiseconds integer or number of frames I have no clue.

LUT program

HYDROLUT.zip HYDROLUT Results.zip

This afternoon I wrote that LUT program in C (also simulates the race timer behaviour), it really should be implemented in assembly with the exact opcodes copy-pasted from HYDRO.EXE but I really don’t want to do that and I don’t really know how, but C should be somewhat close enough and it looks like I got the compiler to use FISTP anyway for the float to integer conversion for the GUI mm:ss.SS. I’ve compiled those as several executables for Windows NT platform with different optimisation flags to test on the Celeron 333 with Windows NT/9x. You can run it with the usual HYDROLUT_FPC.EXE > HYDROLUT_FPC.CSV to write it to a file. I was able to test it first on my old NEC Versa 2760 MT and I found it incorrectly rounded a couple times, but was kinda expecting a lot more to just be off by 1 bit, it is an older P5 architecture CPU so should be somewhat equivalent. And in the event the program actually works and solves this whole problem I recorded testing it on the laptop just in case you need some alternate B roll whilst you explain this hot mess (I’ve got other angles and the 100 mbps files).

	Actual Race Frame Time	C Rounding (all others)	C Rounding (-fpc)	Python 3.10 (amd64)
Float Value	166.739990234375000	166.729995727539063	166.740005493164063	166.7400054931640625
Float Hex	0x4326bd70	0x4326bae1	0x4326bd71	0x4326bd71

However, this program says 0x42c8851f is a valid time, it does not generate 0x42c8851e which has been found in the high scores table, so I don’t know what else the game is doing to get it off by one, it may very well be something unrelated to this float issue that affects the GUI. But for now I think I’ll wait until after you work on this in the next Hydro Thunder stream to do more work on this as this hypothesis isn’t working out so well, but I’m hoping that Celeron just has a complete trash FPU for some reason and those programs can find it but I think the odds are very slim. That brute force with the test bench and SD2SCSI + Raspberry Pi and some sort of screen reader might be the only viable option but to brute force the validity of just 80 s to 150 s would probably take one or several weeks in restart times, either way I’m not sure how much more I can reverse engineer without either running out of easily readable decompiled code or time, so hopefully it can be solved without any of that.

Edit: had a look at the new .img files, have no clue what it's doing to cause this.

Some finding from emulating the game patched for custom Voodoo and Diego PCB check with PCem with Intel Celeron 333 and Voodoo 2 8 MB (on AMD 3950X & RTX3070)

Confirmed centiseconds of high score times round differently in the UI compared to actual arcade machine hardware
Modifying checksum of CMOS in RAM as game was running did not cause it to crash or error out, nor did modifying the high score times or names, validity of checksum likely only done when CMOS is read from HDD rather than whenever high score table is shown
High score table is a grid of pre-rendered sprites for each character that are manipulated for the wavey scaling effect, those are generated once before displaying the high scores while running the attract screen and then kept in RAM for subsequent loops
High scores can be modified in RAM before the high score table is displayed
The physical hardware buttons of the arcade machine that are read through the Diego PCB are unfortunately not just mapped to keyboard inputs, to start a game we need to either capture the serial data for a button press or inject a bit into a switch case / JNZ to force it to start a race, serial capture will probably be much easier.
Text strings for pressing boost to start/free play as well as the text in the developer credits (not the 3D stuff, 2D stuff at the end of credits) are the easiest thing to mess with to confirm editing works and for comedic fun
Game is somewhat resistant to screwing up the RAM content, deleting parts of the map data causes the attract screen to stop and skip to the next map, messing with the camera location and angles doesn't crash it but there is plenty of stuff that will soft lock and hard lock the game (no clue what those variables do).
Graphics have broken textures, most likely because PCem only has 4 MB and 8 MB Voodoo 2 support, manually changing the texture memory to 6 MB in the .CFG does nothing, C code for it is on GitHub but gave up pretty quick.
Some images and text on the audit screens is flipped vertically, no clue why, other images are unaffected, maybe a special Voodoo image draw call that isn't emulated right?

AkBKukU / HydroThunder-TimeTool