fuzun commented 8 years ago

This project already contains a fast inverse square root function named rsqrt. But its usage is limited around the engine.

I have seen some implementations for embedded systems (<1 watt arm cpus are not much different :) ) which find approximation much faster. Here is one of them:


float  sqrt2(const float x)
{
    const float xhalf = 0.5f*x;
    union // get bits for floating value
    {
        float x;
        int i;
    } u;
    u.x = x;
    u.i = SQRT_MAGIC_F - (u.i >> 1);  // gives initial guess y0
    return x*u.x*(1.5f - xhalf*u.x*u.x);// Newton step, repeating increases accuracy 
}

I tested this function to find square root of 256 and here is results: (Differences are measured as microseconds)

15.972915 (1 iteration) Time difference [Fast sqrt.]= 67 16.000000 Time difference [math sqrt.]= 959 15.972915 (1 iteration) Time difference [Fast sqrt.]= 15 16.000000 Time difference [math sqrt.]= 380 15.972915 (1 iteration) Time difference [Fast sqrt.]= 14 16.000000 Time difference [math sqrt.]= 387

What do you think? Can this be beneficial?

a1batross commented 8 years ago

It's the same as rsqrt.

I am planned to move mathlib to use NEON intrinsics. Engine use vectors almost everytime, so calculating them using SIMD must be faster in 3-4 times.

cia48621793 commented 8 years ago

The Carmack hack is almost obsolete now with hardware instructions.

fuzun commented 8 years ago

@cia48621793 Intel vectorization report still shows plenty amount of vectorization fails. Unfortunately uncle mikes code is not optimized for hardware. But if albatros manages to implement neon perfectly, it will be far better. At least for Android.

cia48621793 commented 8 years ago

@fuzun Not to mention that NEON does not implement full IEEE-754 floating point toom

mittorn commented 8 years ago

Vectorization fails almost everywhere because of 12byte vec3_t

fuzun commented 8 years ago

I do not think all of them are related to vec3_t. But why xash3d use vec3_t instead of c++ vector? Maybe its written in c or to provide compatibility?

https://vgy.me/Vi7XPQ.png https://vgy.me/EoSh58.png

Also I have seen that SDL utilizes CPU so much. Getting rid of sdl may be beneficial for phones etc.

a1batross commented 8 years ago

SDL is not used on Android since 0.18.

C++ Vector is the same as vec3_t. (If you mean Vector from server and client dll).

9 сент. 2016 г. 2:47 PM пользователь "fuzun" notifications@github.com написал:

I do not think all of them are related to vec3_t. But why xash3d use vec3_t instead of c++ vector? Maybe its written in c or to provide compatibility?

https://vgy.me/Vi7XPQ.png https://vgy.me/EoSh58.png

Also I have seen that SDL utilizes CPU so much. Getting rid of sdl may be beneficial for phones etc.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/FWGS/xash3d/issues/172#issuecomment-245891471, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEJgXfMkuzViA6g2jfXzNrrRH5lkpGNks5qoUdcgaJpZM4JuYLS .

fuzun commented 8 years ago

In xash3d, is not vectors float[3] ? I mean the engine.

a1batross commented 8 years ago

Yes. Vectors are float[3]. Meanwhile Vector class is defined as class Vector{ float x, y, z; }. So same vectors are safely used between engine and DLLs. 09.09.2016 16:03 пользователь "fuzun" notifications@github.com написал:

In xash3d, is not vectors float[3] ? I mean the engine.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/FWGS/xash3d/issues/172#issuecomment-245906494, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEJgYTTW_Gay3OiKPpkXxivWGZSZnYhks5qoVkegaJpZM4JuYLS .

fuzun commented 8 years ago

Yes but compilers do not make good optimization if you use your own vector definition instead of standards. I think mittorn meant this. Consider intel compiler. You are probably not interested in proprietary stuff but grab a student or trial license to test and see how much difference it makes.

a1batross commented 8 years ago

Show me standart vector. :D 09.09.2016 17:12 пользователь "fuzun" notifications@github.com написал:

Yes but compilers do not make good optimization if you use your own vector definition instead of standards. I think mittorn meant this. Consider intel compiler. You are probably not interested in proprietary stuff but grab a student or trial license to test and see how much difference it makes.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/FWGS/xash3d/issues/172#issuecomment-245924570, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEJgXdPt1-JUKbk5Z4A1WLT53fxOaePks5qoWlYgaJpZM4JuYLS .

fuzun commented 8 years ago

include "vector" maybe?

How compiler know if you work with vectors when you use float array.

I still do not understand how utilizing vectors from local vector definition relates to vectorization.

https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf

mittorn commented 8 years ago

I meant that vec3_t cannot be loaded by single SIMD instruction as it's size is not multiply of vec4_t

fuzun commented 8 years ago

Reports show that main problem is with loops. Intel compiler can not vectorize them. So maybe it cant because loops contain some vec3_t vectors?

a1batross commented 8 years ago

If ypu mean SIMD vectors, I'm working on it(ARM only).

Otherwise I can't understand you.

fuzun commented 8 years ago

@a1batross Did you look this: https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf ? I imply implementing it only for Android is bad since xash3d meant to be cross platform.

https://software.intel.com/en-us/articles/vectorization-with-the-intel-compilers-part-i https://software.intel.com/en-us/articles/a-guide-to-auto-vectorization-with-intel-c-compilers

Maybe you do not work with intel compilers but it makes huge difference regarding performance. Also while you work on math functions, you can add intel compiler macros so that it wont hurt anybody. https://software.intel.com/sites/default/files/ed/39/VecSamples.zip

Also, even if you do not add intel specific things, fixing warnings intel app generated may also make gcc's auto vectorization feature much more applicable. I posted screenshots above.

a1batross commented 8 years ago

I don't see performance problems on x86. GCC good optimizes the code.

BTW thanks for links, I will read these. :)

2016-09-09 23:51 GMT+03:00 fuzun notifications@github.com:

@a1batross https://github.com/a1batross Did you look this: https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848- CompilerAutovectorizationGuide.pdf ? I imply implementing it only for Android is bad since xash3d meant to be cross platform.

https://software.intel.com/en-us/articles/vectorization- with-the-intel-compilers-part-i https://software.intel.com/en-us/articles/a-guide-to-auto- vectorization-with-intel-c-compilers

Maybe you do not work with intel compilers but it makes huge difference regarding performance. Also while you work on math functions, you can add intel compiler macros so that it wont hurt anybody. https://software.intel.com/sites/default/files/ed/39/VecSamples.zip

Also, even if you do not add intel specific things, fixing warnings intel app generated may also make gcc's auto vectorization feature much more applicable. I posted screenshots above.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FWGS/xash3d/issues/172#issuecomment-246036087, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEJgc9vmCd7ZYjVLpFAdku-FyvJBb8Pks5qoca8gaJpZM4JuYLS .

cia48621793 commented 8 years ago

IIRC VFP and NEON both use 64 bit registers. It could be slow converting float to double. In x87, 80 bit make it even worse.

a1batross commented 7 years ago

NEON can use 32, 64 bit floats both. And even 16 bit float(if halffloats will be used on ARM someday...).

FWGS / xash3d

Override [sqrt] for [fast sqrt]? #172

include "vector" maybe?