camfort / fortran-src

Fortran parsing and static analysis infrastructure
https://hackage.haskell.org/package/fortran-src
Other
44 stars 20 forks source link

Distinction between assumed-size and assumed-shape arrays #254

Closed RaoulHC closed 1 year ago

RaoulHC commented 1 year ago

Currently Dimension in SemanticTypes allows for assumed-size arrays which use * to denote dynamic dimensions, but not assume-shape ones, which use : to denote dynamic dimensions. Both of these must be dummy arguments.

The former only allows the final dimension to be dynamic, so the following are allowed:

integer arr1(*)
integer arr2(1,2,*)

But the following aren't:

integer arr1(*, 2)
integer arr2(5,*,1)
integer arr3(*, *)

However you can specify other dimensions via additional arguments like such:

subroutine foo(n, m, arr)
integer n, m
integer arr(5, 2:n, m:*)
end subroutine foo

Which the dimensions type currently can't encode.

Assumed shape ones allow all dimensions to be dynamic, but not only some dimensions, so the following is allowed:

integer arr1(:, :)
integer arr2(:, :, :)
integer arr3(2:, m:, :)

but not these:

integer arr1(5, :)
integer arr2(:2)
integer arr3(n:m)

Assumed-shape also allows a number of intrinsics that you can't use with assumed-size arrays:

integer arr(:)
print *, shape(arr)
print *, size(arr)
print *, lbound(arr)
etc...

Another difference is that without interfaces assumed-shape arrays behave poorly, giving errors, failing to link, or in the worse case segfaulting at run time.

As a result, as well as being able to encode the type of arr(:, :), Dimensions should also distinguish between arr(:) and arr(*).

This page gives a good summary of the different ways of representing arrays: https://fortran-lang.org/en/learn/best_practices/arrays/

raehik commented 1 year ago

261 hope to resolve this nicely by leaving most details to the user (redacted for simplicity):

data Dims t a
  = DimsExplicitShape
      (t (Dim a)) -- ^ list of all dimensions

  | DimsAssumedSize
      (Maybe (t (Dim a))) -- ^ list of all dimensions except last
      a          -- ^ lower bound of last dimension

  -- | Assumed-shape array dimensions. Here, we only have the lower bound for
  --   each dimension, and the rank (via length).
  | DimsAssumedShape
      (t a) -- ^ list of lower bounds

-- seems same as tuple, written like this for instances and other reasons
data Dim a = Dim a {- ^ lower bound -} a {- ^ upper bound -}

-- simple concrete usage (close to fortran-vars)
type Dimensions = Dims NonEmpty Int

Simpler to use and enables swapping out the list-like (I'm conscious that lazy lists [] aren't ideal for this). Retains the same soundness guarantees if you use a non-empty list-like. Fairly loud pretty forms due to always recording lower bounds; probably fine to leave to the user. Unless we wish to parametrize the fortran-src pretty printer -- which we already do, passing a FortranVersion around, and I would like to improve it, but it's currently inflexible.

raehik commented 1 year ago

This doesn't support using dummy arguments in arrays like in your third example. That would require a bit more work to ensure variables are properly scoped to reference in array shapes. You also couldn't do much with them unless you know the values of the dummy arguments, and that's something fortran-src (and I think fortran-vars?) don't handle.

We could still support representing them to some degree

RaoulHC commented 1 year ago

Left some little comments but I think in general that makes sense.

Yeah dummy arguments as lower bounds are fine to ignore for now, it would be good to take it into account some point, but they're not common in our codebase.

raehik commented 1 year ago

I see that fortran-vars both supports and tests dummy arguments in upper bounds:

https://github.com/camfort/fortran-vars/blob/1f2a6a0e09335ba286180a1becf7c487bbead9d9/test/symbol_table/dynamic_variables.f

    it "Dummy not dynamic" $ do
      contents <- flexReadFile path
      let st = getSymTable path contents "f6"
      typeOf "arr" st `shouldBe` TArray (TCharacter CharLenStar 1) Nothing -- <- this line
      isDynamic "arr" st `shouldBe` False
      isDummy "arr" st `shouldBe` True
      subroutine f6(arr, n, m)
        integer n, m
        character*(n) arr(m)
      end

All non-explicit-shape arrays were previously placed into Nothing, so it would only be known that the variable is an array and has the given scalar type. These changes intended to remove the Maybe wrapper in TArray SemType (Maybe Dimensions) by enabling comfortable encoding of assumed-size, assumed-shape arrays. As is, we'd have to add back the Maybe to support dummy args without large changes. So we lose our simplicity! :(

Having said that, I'm fairly sure that the above design works if we use a different a index type:

data DimIdx a = DimIdxConstant a | DimIdxDummy Text

I don't love it currently, as I don't want to complicate fortran-vars' top level representations, and I don't see an immediate way to keep it this hidden. But it would appear to be a solution.

RaoulHC commented 1 year ago

Hmmm, I'll have a ponder to see if I can think of a better solution, but I think adding that DimIdxDummy would be preferable to keeping the Maybe. For most transformations we want to do that's probably sufficient information. I think the alternative would be to have some constant expression representation, but that's a lot of complexity for little gain.

raehik commented 1 year ago

Oh right, my case only covers plain dummy vars, but F2018 (and earlier standards) allow a "restricted scalar integer expression". It wouldn't be wrong to define another "AST" at a more fine grained level which could represent these more accurately. Nor would it be hard to plug into the evaluator, since all the logic exists for handling regular AST types as if they were constant exprs.

I think we could begin some constant expression representation -- call it CExpr -- and provide a function SymbolTable -> Dims t CExpr -> Either String (Dims t Int). Or maybe we can write a SymbolTable -> Dims t CExpr -> Dims t (Maybe Int), so a single non-trivial dimension won't mean dropping all other info. We can use this as early as preferred in fortran-vars to avoid exporting a busier representation.

These possible changes and the current merged ones will impact code that depends on fortran-vars. Exactly how depends on what you do with TArrays. I don't want to create pointless churn, so I'll go over some options when we meet tomorrow (but I'll try to keep the interesting discussion on GitHub for reference).

RaoulHC commented 1 year ago

Yeah that could potentially work, I think converting to Dims t Int when we can can would be better. I'm not sure how useful Dims t (Maybe Int) would be to analysis, I think as soon as we can't tell the size of an array, certain analyses aren't possible, and there are certain checks that compilers seem to stop attempting to do in these cases.

raehik commented 1 year ago

Maybe Int means we can still record some info for dynamic dimensions: in particular we always get type of array (explicit shape, assumed size etc.) and rank. It can always be transformed to Maybe (Dims t Int) to assert "staticness" (?). Consider this test from fortran-vars:

https://github.com/camfort/fortran-vars/blob/1f2a6a0e09335ba286180a1becf7c487bbead9d9/test/symbol_table/dummy_argument_symbol.f https://github.com/camfort/fortran-vars/blob/1f2a6a0e09335ba286180a1becf7c487bbead9d9/test/SymbolTableSpec.hs#L334-L353

      subroutine sub(stscalar1, starr1, dynscalar1, dynarr1, dynarr2,
     + dynarr3, dynarr4, *)
      integer stscalar1
      integer starr1(5)

      character*(*) dynscalar1
      integer dynarr1(*)
      integer dynarr2(stscalar1)
      integer dynarr3(3,*)
      integer dynarr4(3,stscalar1)

      integer dynarr5(stscalar1 + 1)

All dynarr* variables are recorded as TArray _scalar Nothing. With this replacement Dims t (Maybe Int) type, we could record dynarr4 as TArray _scalar (DimsExplicitShape (Dim (Just 1) (Just 3) :| [Dim (Just 1) Nothing])).

It may mean lots of transforming mid-analysis so as to continue using current style (which handled either static explicit-shape, or gave Nothing). Such transformation is pure, just shifting data around.

That's my case for Maybe Int -- but fortran-vars should use whatever type is most appropriate. @RaoulHC may I ask which you would suggest out of

and we'll stick with it for now (it'll be easier than this to adapt in future!).

RaoulHC commented 1 year ago

Yeah maybe going for Dims NonEmpty (Maybe Int) with a way of getting a static version is the way of doing it then, there probably are cases where knowing some of the bounds is useful. Perhaps tools built on top should just get the expressions from the AST for dynamic bounds, which would be a bit more involved, but allow us to avoid having the slightly nasty DimIdxDummy.

I think implementing a fairly trivial Traversable instance of Dims t we get the static conversion for free via sequence.

raehik commented 1 year ago

OK, I'll use that. We have Traversable t => Traversable (Dims t) along with the similar & requisite Functor and Foldable (GHC can derive them all for us).

My DimIdxDummy and CExpr ideas were short-sighted, constant expressions aren't a syntactic subset of expressions. Tools building on fortran-src that want to consider arrays with non-constant bounds expressions (mainly dummy vars) should do so using Dims t (Expression ()). Then the strategy for evaluating bounds expressions is left to the user.

I think fortran-vars concrete type rep wants to be post-eval, so as before we can't represent everything. Constant expressions we can evaluate and store Just bound; constant expressions using dummy vars we have to say Nothing.

raehik commented 1 year ago

Implemented in #261 .