Open gendlin opened 2 weeks ago
I don't think these changes are necessary as this is not performance sensitive code. Again, we have very little calculations in these functions, it's nothing compared to the system calls in SDL_GetPerformanceCounter
and Win API functions. I'm sure this improvement is not measurable. In cases like this, I prefer code clarity - it's not obvious why we need to store microseconds in double
.
Doom was originally written for 386/486 processors and didn't use float point at all. I recommend looking into m_fixed.h
and try using float point calculations in the renderer, you can probably get good measurable results here. Note that we can't use float point in the game simulation (p_*
files) due to demo compatibility.
I don't know where exactly you've got these numbers from. But I doubt that any recent system with a native 64-bit pointer size performs so much worse when dividing integers. And yet again, we are not "optimizing" this port for 2000-era computers.
I don't know where exactly you've got these numbers from.
It's probably from these tables: https://www.agner.org/optimize/instruction_tables.pdf Also https://stackoverflow.com/a/55833042
However, I think that optimizing a few instructions in the timer code doesn't improve performance, it's an unnecessary complication.
I doubt that any recent system with a native 64-bit pointer size performs so much worse when dividing integers
Unfortunately not true, i64 div is very expensive all the way up through Coffee Lake (where it is essentially the same cost as on Nehalem). Coffee Lake was made 2017-2021.
It's probably from these tables: https://www.agner.org/optimize/instruction_tables.pdf
Yes, thanks for the link.
Some measurements on my machine based on test code here. I_GetTimeUS
on master has a 50% overhead, and introduces a big pipeline bubble wherever it is called.
it's an unnecessary complication
Check the diff, the resulting code is smaller and fits better with callers that are already doing double arithmetic on time values anyway.
Some measurements on my machine based on test code here.
I_GetTimeUS
on master has a 50% overhead, and introduces a big pipeline bubble wherever it is called.
What I meant was that this improvement would not be noticeable if you were measuring game performance. Again - this code is not performance sensitive for us and we do not micro optimise all functions, code clarity is more important. Although it's interesting that QueryPerformanceCounter
is so fast, usually system calls are much slower.
Doom was originally written for 386/486 processors and didn't use float point at all.
Did some Doom spelunking, there is one instance of 64-bit division here - it's commented out and replaced with doubles.
I have seen that you removed this comment, but I feel the need to reply anyway.
First, it's called FixedDiv()
(and not FloatDiv()
) for a reason. The code you are referring to is the source released under the linuxdoom
moniker. It does not necessasrily match the actual DOS Doom source code, because some functions were translated from ASM to C approximations - among then FixedDiv()
.
In fact, one of the early things that got fixed in Chocolate Doom was the reversal of the double
division back to fixed point division to preserve demo compatibility:
https://github.com/chocolate-doom/chocolate-doom/commit/9815f0f2325ec973c899db766faed87cac49e64d
In the actual reverse-engineered DOS Doom source code the FixedDiv()
function is indeed implemented in ASM and calls cdq
(convert doubleword to quadword) first and then idiv
, so that is clearly a 64-bit integer division:
so that is clearly a 64-bit integer division
It's not - ebx is a 32-bit register. There is no i64 division on 386.
preserve demo compatibility
Just to be clear there is no impact on demo compatibility with these changes. Double arithmetic is already being done on these time values elsewhere, I'm not doing anything groundbreaking here.
These are pretty bad from a performance standpoint, see table below (from Agner Fog's instruction latency tables for Nehalem). Just use doubles for high-precision timekeeping instead, and use multiplication instead of division where we can.
compare to:
and: