ecmwf-ifs / loki

Freely programmable source-to-source translation for Fortran
https://sites.ecmwf.int/docs/loki/
Apache License 2.0
22 stars 11 forks source link

Definitions not attached if multiple scopes defined in the same `Sourcefile` #161

Closed rommelDB closed 1 month ago

rommelDB commented 9 months ago

Regarding the execution of the following code:

from loki import Sourcefile, Subroutine, FindVariables
from pprint import pprint

fcode = """
module foo
  implicit none
  integer, parameter :: j = 16

  contains
    integer function SUM(v)
      implicit none
      integer, intent(in) :: v
      SUM = v + 1
    end function SUM
end module foo

module test
  use foo
  implicit none
  integer, parameter :: rk = selected_real_kind(12)
  integer, parameter :: ik = selected_int_kind(9)

  contains
    subroutine calc (n, res)
      integer, intent(in) :: n
      real(kind=rk), intent(inout) :: res
      real(kind=rk), dimension(n) :: vec

      integer(kind=ik) :: i

      do i = 1, n
         vec(i) = 1.0_rk/(real(n - i + 1, kind=rk))
      end do

      do i = 1, n
        res = res + SQRT(real(i, kind=rk)) + SUM(j)
      end do
    end subroutine calc

end module test
"""

source = Sourcefile.from_source(fcode)
calc_subroutine = source['calc']
vars = FindVariables().visit(calc_subroutine.body)
pprint(vars)

Actual output:

{Array('vec', None, None, None, (Scalar('i', None, None, None),)),
 DeferredTypeSymbol('REAL', None, None, <SymbolAttributes BasicType.DEFERRED>),
 DeferredTypeSymbol('SQRT', None, None, <SymbolAttributes BasicType.DEFERRED>),
 DeferredTypeSymbol('SUM', None, None, <SymbolAttributes BasicType.DEFERRED>),
 DeferredTypeSymbol('j', None, None, <SymbolAttributes BasicType.DEFERRED>), 
 Scalar('i', None, None, None),
 Scalar('rk', None, None, None),
 Scalar('n', None, None, None),
 Scalar('res', None, None, None)}

I wanted to print variables only, however, I didn't expect to REAL, SQRT, and SUM be in the list.

I'm aware that it is the current definition, though: https://github.com/ecmwf-ifs/loki/blob/19fd1c439250e951744cfe941efea71df85b5a7a/loki/expression/expr_visitors.py#L169

So, in order to obtain the true type of deferred symbol. Would a new pass or something like that be required? I would appreciate any feedback.

reuterbal commented 9 months ago

Hi Rommel, very good question and let me try to provide a few pointers why we don't resolve these symbols currently:

  1. Let me start by explaining why DeferredTypeSymbol is included in the list of variables: This is because we represent that way any symbols for which we cannot conclude what type they have. That includes, among others, symbols that are imported from other scopes, e.g., module variables. Importantly, when definitions are available they are automatically attached, but your example points out an interesting limitation where this doesn't happen because multiple "top-level" scopes are defined within the same source file. Putting the modules into separate Sourcefile objects (which is what we typically find in our code base) would be a workaround how SUM (which overwrites the definition of a Fortran intrinsic) will be correctly recognized as a ProcedureSymbol, which we don't include in the variables lookup:
from loki import Sourcefile, Subroutine, FindVariables
from pprint import pprint

fcode_foo = """
module foo
  implicit none
  integer, parameter :: j = 16

  contains
    integer function SUM(v)
      implicit none
      integer, intent(in) :: v
      SUM = v + 1
    end function SUM
end module foo
"""

fcode_test = """
module test
  use foo
  implicit none
  integer, parameter :: rk = selected_real_kind(12)
  integer, parameter :: ik = selected_int_kind(9)

  contains
    subroutine calc (n, res)
      integer, intent(in) :: n
      real(kind=rk), intent(inout) :: res
      real(kind=rk), dimension(n) :: vec

      integer(kind=ik) :: i

      do i = 1, n
         vec(i) = 1.0_rk/(real(n - i + 1, kind=rk))
      end do

      do i = 1, n
        res = res + SQRT(real(i, kind=rk)) + SUM(j)
      end do
    end subroutine calc

end module test
"""

source_foo = Sourcefile.from_source(fcode_foo)
source_test = Sourcefile.from_source(fcode_test, definitions=source_foo.definitions)
calc_subroutine = source_test['calc']
vars = FindVariables().visit(calc_subroutine.body)
pprint(vars)
{Array('vec', None, None, None, (Scalar('i', None, None, None),)),
 DeferredTypeSymbol('REAL', None, None, <SymbolAttributes BasicType.DEFERRED>),
 DeferredTypeSymbol('SQRT', None, None, <SymbolAttributes BasicType.DEFERRED>),
 Scalar('res', None, None, None),
 Scalar('n', None, None, None),
 Scalar('i', None, None, None),
 Scalar('rk', None, None, None),
 Scalar('j', None, None, None)}

I consider this a bug but one that should be fairly easy to fix, if you wanted to try that. I suspect adding a module to definitions when it is instantiated might be one way of achieving that, e.g., here for fparser: https://github.com/ecmwf-ifs/loki/blob/19fd1c439250e951744cfe941efea71df85b5a7a/loki/frontend/fparser.py#L2075

  1. Achieving the same for the other intrinsic functions REAL and SQRT is a little trickier, as it requires baking in the information about Fortran intrinsics, which the frontends have to provide to us. Moreover, it would likely require introducing dedicated scopes for each of the intrinsic modules. In the scope attacher we are already using that information from Fparser: https://github.com/ecmwf-ifs/loki/blob/19fd1c439250e951744cfe941efea71df85b5a7a/loki/expression/mappers.py#L21-L22 to gracefully step over intrinsic symbols: https://github.com/ecmwf-ifs/loki/blob/19fd1c439250e951744cfe941efea71df85b5a7a/loki/expression/mappers.py#L783-L784 Unfortunately, this does not include information about the types of arguments and, more importantly, the return type of these functions: https://github.com/stfc/fparser/blob/f43a18fdb9b03a70fa0f5b9a04e3e01b12533360/src/fparser/two/Fortran2003.py#L12027 Therefore, even then we wouldn't be able to deduce the full type of these symbols. Ideally, this information should therefore be added to fparser first, and then picked up by Loki to fully resolve the type of these symbols. A short-term workaround for your use-case would of course be to just filter the list of variables by name using the intrinsic names property, which by itself might be a useful utility routine already.

  2. Lastly, once the full type information was available we would even be capable to implement checks for correct type in expressions, which would be a very useful Linter rule, but is dependent on the above points to be resolved first, of course.

I hope that helps and feel free to come back with follow-up questions of course!