Remove 64-bit integer divides from performance sensitive code

gendlin commented 2 weeks ago

These are pretty bad from a performance standpoint, see table below (from Agner Fog's instruction latency tables for Nehalem). Just use doubles for high-precision timekeeping instead, and use multiplication instead of division where we can.

operation	instruction	cycles	uops
uint64 divide	div	28-90	40
int64 divide	idiv	37-100	60

compare to:

operation	instruction	cycles	uops
fp64 divide	divsd	7-22	1
int32 divide	idiv	17-28	7

and:

operation	instruction	cycles	uops
int64 to double	cvtsi2sd	6	2
double to int64	cvtsd2si	5	1
int32 multiply	imul	3	1
fp64 multiply	mulsd	5	1

rfomin commented 2 weeks ago

I don't think these changes are necessary as this is not performance sensitive code. Again, we have very little calculations in these functions, it's nothing compared to the system calls in SDL_GetPerformanceCounter and Win API functions. I'm sure this improvement is not measurable. In cases like this, I prefer code clarity - it's not obvious why we need to store microseconds in double.

Doom was originally written for 386/486 processors and didn't use float point at all. I recommend looking into m_fixed.h and try using float point calculations in the renderer, you can probably get good measurable results here. Note that we can't use float point in the game simulation (p_* files) due to demo compatibility.

fabiangreffrath commented 1 week ago

I don't know where exactly you've got these numbers from. But I doubt that any recent system with a native 64-bit pointer size performs so much worse when dividing integers. And yet again, we are not "optimizing" this port for 2000-era computers.

rfomin commented 1 week ago

I don't know where exactly you've got these numbers from.

It's probably from these tables: https://www.agner.org/optimize/instruction_tables.pdf Also https://stackoverflow.com/a/55833042

However, I think that optimizing a few instructions in the timer code doesn't improve performance, it's an unnecessary complication.

gendlin commented 1 week ago

I doubt that any recent system with a native 64-bit pointer size performs so much worse when dividing integers

Unfortunately not true, i64 div is very expensive all the way up through Coffee Lake (where it is essentially the same cost as on Nehalem). Coffee Lake was made 2017-2021.

It's probably from these tables: https://www.agner.org/optimize/instruction_tables.pdf

Yes, thanks for the link.

gendlin commented 1 week ago

Some measurements on my machine based on test code here. I_GetTimeUS on master has a 50% overhead, and introduces a big pipeline bubble wherever it is called.

time-test

gendlin commented 1 week ago

it's an unnecessary complication

Check the diff, the resulting code is smaller and fits better with callers that are already doing double arithmetic on time values anyway.

rfomin commented 1 week ago

Some measurements on my machine based on test code here. I_GetTimeUS on master has a 50% overhead, and introduces a big pipeline bubble wherever it is called.

What I meant was that this improvement would not be noticeable if you were measuring game performance. Again - this code is not performance sensitive for us and we do not micro optimise all functions, code clarity is more important. Although it's interesting that QueryPerformanceCounter is so fast, usually system calls are much slower.

fabiangreffrath commented 1 week ago

Doom was originally written for 386/486 processors and didn't use float point at all.

Did some Doom spelunking, there is one instance of 64-bit division here - it's commented out and replaced with doubles.

I have seen that you removed this comment, but I feel the need to reply anyway.

First, it's called FixedDiv() (and not FloatDiv()) for a reason. The code you are referring to is the source released under the linuxdoom moniker. It does not necessasrily match the actual DOS Doom source code, because some functions were translated from ASM to C approximations - among then FixedDiv().

In fact, one of the early things that got fixed in Chocolate Doom was the reversal of the double division back to fixed point division to preserve demo compatibility:

https://github.com/chocolate-doom/chocolate-doom/commit/9815f0f2325ec973c899db766faed87cac49e64d

In the actual reverse-engineered DOS Doom source code the FixedDiv() function is indeed implemented in ASM and calls cdq (convert doubleword to quadword) first and then idiv, so that is clearly a 64-bit integer division:

https://bitbucket.org/gamesrc-ver-recreation/doom/src/bd6beabfd561613b671fe5b8842e7d373731d2c5/doomdef.h#lines-789

gendlin commented 1 week ago

so that is clearly a 64-bit integer division

It's not - ebx is a 32-bit register. There is no i64 division on 386.

preserve demo compatibility

Just to be clear there is no impact on demo compatibility with these changes. Double arithmetic is already being done on these time values elsewhere, I'm not doing anything groundbreaking here.

fabiangreffrath / woof

Remove 64-bit integer divides from performance sensitive code #1769