aradi / fypp

Python powered Fortran preprocessor
http://fypp.readthedocs.io
BSD 2-Clause "Simplified" License
180 stars 30 forks source link

Horner's algorithm with fypp #8

Closed ivan-pi closed 2 years ago

ivan-pi commented 4 years ago

Hello,

I am trying to create a macro to efficiently inline polynomial expressions. This is related to issue https://github.com/fortran-lang/stdlib/issues/179. The authors of Julia reckon, that meta-programming is one of the reasons they can beat Fortran:

The reason Julia can beat the Fortran code is that metaprogramming makes it easy to apply performance optimizations that are awkward in Fortran. We have metaprogramming macros (@evalpoly) that can easily inline polynomial evaluations, whereas the Fortran code makes function calls that loop over look-up tables of polynomial coefficients. Even greater speedups are possible for evaluating polynomials of complex arguments, where there is a fancy recurrence from Knuth that is almost impossible to use effectively without code generation. In principle, the Fortran authors could have done the same inlining and gotten similar performance, but the code would have been much more painful to write by hand. (They could even have written a program to generate Fortran code, but that is even more painful.)

In principle, this is exactly what we have fypp for, and it should not be that painful at all.

The canonical way to implement Horner's algorithm in Fortran that loops over a table of coefficients is:

function evalpoly(x,p) result(res)
    real, intent(in) :: x
    real, intent(in) :: p(:)
    real :: res
    integer :: i, n

    n = size(p)
    res = p(n)
    do i = n-1,1,-1
        res = p(i) + res*x
    end do 
end function

We can write a very similar function to generate the symbolic expression in Python:

def horner(x,coeffs):
    n = len(coeffs)
    cfs = [str(coeff) for coeff in coeffs]
    res = cfs[n-1]
    for i in range(n-2,-1,-1):
        res = "({} + {}*{})".format(cfs[i],x,res)
    return res

Calling this function as horner('x',(1,2,3,4) will return the string "(1 + x*(2 + x*(3 + x*4)))". Assuming the function is in the file "horner.py" I can import this function using fypp -m horner <source.fypp> <output.f90>. To call the function I have to use now

program main
implicit none
real :: x, res
real :: p(4) = [1,2,3,4]
real, external :: evalpoly

x = 0.5

#:set coeffs = (1,2,3,4) 
res = ${horner.horner('x',coeffs)}$

print *, res, evalpoly(x,p)
end program

Is it possible to have Horner's algorithm live exclusively inside of a ".fypp" file? I have the feeling that with some clever preprocessing constructs, this should be possible (and preferable to the module import syntax).

Thanks for creating this tool!

ivan-pi commented 4 years ago

In the meantime I've figured out that positional and keyword arguments also work, giving me the following macros:

#:def muladd(x,y,z)
(${x}$*${y}$ + (${z}$))
#:enddef

#:def horner2(num,*pos,**kwargs)
#:if "prec" in kwargs
  #:set n = len(pos)
  #:set res = str(pos[n-1]) + "_" + kwargs["prec"]
  #:for i in range(n-2,-1,-1)
    #:set res = muladd(res,num,str(pos[i])+ "_" + kwargs["prec"])
  #:endfor
${res}$
#:else
  #:set n = len(pos)
  #:set res = pos[n-1]
  #:for i in range(n-2,-1,-1)
    #:set res = muladd(res,num,pos[i])
  #:endfor
${res}$
#:endif
#:enddef

This is meant to be used in a direct call directive, such as:

res = @horner2(x,1.,2.,3.,prec=sp)

It is a bit inconvenient that multi-line directives cannot be inlined. For high order polynomials with long coefficients it seems like I need a macro which allows the user to assign to a left-hand-side. With eval directives on the other hand, the floating point coefficients need to be passed as strings, so that Python does not interfere with the coefficients.

aradi commented 4 years ago

Yes, your conclusion is unfortunately true. Currently, Fypp does not allow multiline inline directives, as that would make parsing and assembling the processesd line much more complicated. But I see your point, and I agree, that having

res = @{horner(x, 1., &
  & 2., 3., prec=sp)}@

would convient. I'll have a look again, what consequences allowing for line breaks in inline directives would have.

aradi commented 4 years ago

By the way, your macro can be written shorter, if do not replicate the algorithm within the #:if:

#:def horner(num, *pos, **kwargs)
  #:set prec = kwargs.get("prec", "")
  #:if prec
    #:set pos = [p + "_" + prec for p in pos]
  #:endif
  #:set n = len(pos)
  #:set res = pos[n-1]
  #:for i in range(n - 2, -1, -1)
    #:set res = muladd(res, num, pos[i])
  #:endfor
  $:res
#:enddef
ivan-pi commented 4 years ago

Thanks! I didn't know the .get() method of Python dictionaries had an optional value. Very convenient.

aradi commented 4 years ago

I have started to play around with the possibility of allowing for continuation lines in inline directives, as I see that this could make the meta-code more readable. Unfortunatly, due to an implementation detail, it requires some rewriting of the internal tree builder, so it may take a while...

ivan-pi commented 2 years ago

What if instead of a horner evaluation macro, it was a built-in fypp eval directive? Or is this outside the scope of fypp (since it can be achieved as a macro anyways)?

aradi commented 2 years ago

Unfortunately, I did not have the time to change Fypps core yet, so that multiline built-ins are still not possible. I don't think, that adding something so special to the built-in eval is a good idea, especially, because this can be implemented as a macro. Currently, my advice would be to put the assignment into the macro as well:

#:def assign_horner(lhs, num, *pos, **kwargs)
  #:set prec = kwargs.get("prec", "")
  #:if prec
    #:set pos = [p + "_" + prec for p in pos]
  #:endif
  #:set n = len(pos)
  #:set res = pos[n-1]
  #:for i in range(n - 2, -1, -1)
    #:set res = muladd(res, num, pos[i])
  #:endfor
  ${lhs}$ = ${res}$
#:enddef

That way, you could use the non-inline version of the macro with line breaks:

@:assign_horner(res, x, 1., &
  & 2., 3., prec=sp)
ivan-pi commented 2 years ago

Using the recursion trick (#21) and the intrinsic function ieee_fma from the ieee_arithmetic module, here's a second take on Horner's algorithm:

#:def fma(a,b,c)
ieee_fma(${a}$,${b}$,${c}$)
#:enddef

#:def assign_polyval(lhs,num,a,b,*pos)
  #:set res = fma(a,num,b)
  #:if len(pos) > 0
    #:set res = assign_polyval(lhs,num,res,pos[0],*pos[1:])
    $:res
  #:else
    ${lhs}$ = ${res}$
  #:endif
#:enddef

@:assign_polyval(p,x,1.0,2.0,3.0)

which prints

    p = ieee_fma(ieee_fma(1.0,x,2.0),x,3.0)
ivan-pi commented 2 years ago

Reflecting upon this problem, I came to realize it might be more desirable to instantiate whole functions directly, and rely on effective compiler inlining.

#:def fma(a,b,c)
ieee_fma(${a}$,${b}$,${c}$)
#:enddef

#:def polyval(num,a,b,*pos)
  #:set res = fma(a,num,b)
  #:if len(pos) > 0
    #:set res = polyval(num,res,pos[0],*pos[1:])
  #:endif
  $:res
#:enddef

#:def polyfunc(name,*pos,**kwargs)
  #:assert len(pos) >= 2
  #:set prec = kwargs.get("prec", "")
  #:set type = "real"
  #:if prec
    #:set pos = ["{}_{}".format(p,prec) for p in pos]
    #:set type = type + "({})".format(prec)
  #:endif
  #:set res = polyval("x",pos[0],pos[1],*pos[2:])
  pure function ${name}$(x) result(y)
    use, intrinsic :: ieee_arithmetic, only: ieee_fma
    ${type}$, intent(in) :: x
    ${type}$ :: y
    y = ${res}$
  end function
#:enddef

This can be called in the contains section,

@:polyfunc(poly3,&
  1.0,&
  2.0,&
  3.0,&
  5.0,&
  prec = dp)

giving an entire function as result

  pure function poly3(x) result(y)
    use, intrinsic :: ieee_arithmetic, only: ieee_fma
    real(dp), intent(in) :: x
    real(dp) :: y
    y = ieee_fma(ieee_fma(ieee_fma(1.0_dp,x,2.0_dp),x,3.0_dp),x,5.0_dp)
  end function

With Intel Fortran the function can even be made elemental :rocket:.