3Dickulus / FragM

Derived from https://github.com/Syntopia/Fragmentarium/
GNU General Public License v3.0
349 stars 30 forks source link

quad-double support for Camera2D navigation #147

Closed claudeha closed 4 years ago

claudeha commented 4 years ago

Is your feature request related to a problem? Please describe.

The double-precision zoom limit for 2D frags is frustrating. Using double-double, triple-double, quad-double compensated arithmetic techniques using unevaluated sums can work to increase precision (though in GLSL tricks need to be performed to avoid unsafe math optimisations that break everything; it can be made to work). However, this is not much use if one can't navigate with the 2D camera controls (mouse zooming, panning, etc).

Describe the solution you'd like

Support for quad-double precision for the Camera2D Center variable, gracefully downgrading to triple-double, double-double, double, or float, depending on which uniforms are defined. Something like uniform dvec2 Center[4]; where the array size can be from 1 to 4, or omitted for size 1, or vec2 for float (without array).

No intention of supporting double-float, triple-float, quad-float as double-float is less precise and the same speed or slower than double on most GPUs.

Widget in user interface would be multiple sliders (eg 8 double-sliders for quad-double 2D, arranged in order like x0 x1 x2 x3 y0 y1 y2 y3). Saved in presets like that too (unevaluated sum, not necessarily canonical form). Mouse navigation would ensure canonical form, but shader should be expected to canonicalize input.

Describe alternatives you've considered

Using only multiple sliders for choosing location is far too awkward in practice.

Additional context

type / approximate zoom limit float / 1e6 double /1e15 double-double / 1e30 triple-double / 1e45 quad-double / 1e60

see also: #146

3Dickulus commented 4 years ago

uniform dvec2 Center[4];

could that be better served as uniform dvec4 CenterX and dvec4 CenterY ?

it's a brilliant idea, a very ambitious undertaking, also a very specialized application

Support for quad-double precision for the Camera2D Center variable, gracefully downgrading to triple-double, double-double, double, or float, depending on which uniforms are defined.

somehow I think "gracefully" will be difficult at best, impossible at worst.

is this being done with bits of code from libqd transposed to GLSL ? or from gqd ?

ref: http://homepages.math.uic.edu/~jan/mcs572/quad_double_cuda.pdf ref: https://github.com/lumianph/gpuprec

claudeha commented 4 years ago

could that be better served as uniform dvec4 CenterX and dvec4 CenterY ?

Probably the shaders would repack to that, yes, but the UI may be easier to do the graceful downgrade with arrays instead of having to handle all of double, dvec2, dvec3, dvec4.

it's a brilliant idea, a very ambitious undertaking, also a very specialized application

Yes, ambitious and specialized, but would be great to try it at least. Maybe it will turn out too slow to be practical, double-double is typically 10x slower than double on CPU iirc...

somehow I think "gracefully" will be difficult at best, impossible at worst.

If the quad-double is normalized, a normalized triple-double/double-double/double is just the first 3/2/1 values, but having it internally on the CPU in just quad-double at all times would simplify things.

is this being done with bits of code from libqd transposed to GLSL ? or from gqd ?

I was thinking copy/pasting the small parts of libqd that would be needed, with attribution, into a small header file for the C part, so that no extra deps are needed. Then think about porting more of qd to GLSL for the frag uses (starting with arithmetic and square root, maybe the transcendental stuff later).

Thanks for the links.

3Dickulus commented 4 years ago

considering this from ref[1] ?

The implementation with an interval memory layout is reported to be three times faster over the sequential memory layout.

...or does that only apply only to CUDA and not available under GLSL semantics?

claudeha commented 4 years ago

On 06/06/2020 21:40, 3Dickulus wrote:

interval memory layout is reported to be three times faster over the sequential memory layout I think that's only relevant if you have big arrays of quad-doubles in memory.

3Dickulus commented 4 years ago

hmm... like in a texture buffer? fanciful speculation perhaps. probably best to try some simple stuff from libqd to get a feel for path to take with this, by simple stuff I mean based on qd using vec4 as real and then moving to complex types of dvec4[2] <- there's your 8 sliders

I don't think I can make an hiprec single slider :(

3Dickulus commented 4 years ago

some random thoughts...

...internally, dvec4 varName values are addressed varName1, varName2... fi: when applying an easing curve to vec4 varName.y in the gui a resulting preset will use the label varName2 edit( and vice versa)

can add setParameter (name, [T]vec[n]) functions... currently these functions only take form f(name, val,val,val,val)

already have glm::vec[n] getParameter[n]f ( QString name ) functions... could be templatized ? [T] getParameter[n][T]( QString name ) ?

claudeha commented 4 years ago

On 06/06/2020 22:41, 3Dickulus wrote:

easing curve these (and camera keyframes if they are added for 2D) will probably break when applied to quad-double individual components (not really a meaningful operation), the entire quad-double needs to be handled as a whole.

3Dickulus commented 4 years ago

yes, my line of thought was about internal manipulations, simply reiterating what's already in place and speculating about finishing some of that stuff.

claudeha commented 4 years ago

I started implementing something, got a proof of concept working but I can't zoom beyond about 2e18 without the position getting very quantized, which makes it useless:

Zoom = 2.88871568071353344e+18 Logarithmic
CenterX = -1.25841004516429256,5.40000000000000023e-17,0,0
CenterY = 0.382432698177375352,2.99999999999999983e-18,0,0

My current guess is that the Float slider Number box uses %f when %g would be better for tiny values to avoid precision loss.

(It was easier to implement CenterX/CenterY as separate dvec4 widgets rather than tackle any array stuff at this time.)

claudeha commented 4 years ago

In a QDoubleSpinBox with decimals set to 2, calling setValue(2.555) will cause value() to return 2.56.

The displayed value of the QDoubleSpinBox is limited to 18 characters

I guess it's not designed at all for very small values. I think I can work around it, by rescaling the values before/after passing to/from display/presets/etc

3Dickulus commented 4 years ago

everywhere a double is used for read, output is determined by widget type

QString FloatWidget::toString()
{
    double f = comboSlider1->getValue();
    return QString::number(f,'g',(isDouble() ? DDEC : FDEC));
}

    double getValue()
    {
        return spinner->value(); // returns double
    }

with precision determined by type DDEC : FDEC are set to 18 and 9 respectively [Double|Float]DECimals

claudeha commented 4 years ago

work in progress at https://github.com/claudeha/FragM/tree/feature-camera2d-quad-double example shader at https://code.mathr.co.uk/de/blob/de71631f4b8edde46af060818946429e7b3e89e4:/glsl/include/Camera2D.frag https://code.mathr.co.uk/de/blob/de71631f4b8edde46af060818946429e7b3e89e4:/glsl/examples/mET.frag

It works, but as expected it is quite slow (double-double is ~10x slower than double, which is ~4x slower than float on my GPU, plus the deeper zooms where double-double is necessary typically need more iterations. I timed-out my GPU (forced quit of X session) a couple of times...

3Dickulus commented 4 years ago

As the next big step in our efforts to accelerate high performance computing, the NVIDIA Ampere architecture defines third-generation Tensor Cores that accelerate FP64 math by 2.5x compared to last-generation GPUs.

...from https://blogs.nvidia.com/blog/2020/05/14/double-precision-tensor-cores/

3Dickulus commented 4 years ago

double-double is ~10x slower than double, which is ~4x slower than float