Better represent Fortran array dimension bounds

raehik commented 1 year ago

There are many different array types in (later standards of) Fortran, lots dynamic in various ways. Historically, fortran-src and fortran-vars didn't handle many dynamic array types, and weren't able to represent them in types used during code analysis.

-- fortran-src: ad-hoc type used to annotate certain AST nodes
data ConstructType =
-- ...
  | CTArray [(Maybe Int, Maybe Int)] -- unsure how we represent dynamic/static

-- fortran-vars: Fortran type enumeration, especially for F77e
data SemType =
-- ...
  | TArray SemType (Maybe Dimensions) -- `Nothing` used for all non-constants dynamic

-- constant explicit-shape Fortran array dimensions. each dimension is (lower bound, upper bound)
type Dimensions = [(Int, Int)]

On @RaoulHC 's recommendation, we amended the Dimensions type to allow representing more array types (setting aside whether or not we do any analysis on them). Now we have

https://github.com/camfort/fortran-src/blob/4a890faa610c92a3e483c1cb32568de15a43fef6/src/Language/Fortran/Common/Array.hs#L52-L63

which lets us be more accurate, and provides good flexibility. But we still can't represent dynamic arrays which have their size set at call time. From a fortran-vars test (symbol_table/dynamic_variables.f):

      subroutine f6(arr, n, m)
        integer n, m
        character*(n) arr(m)
      end

This is an explicit-shape array, but not "static" / size not known at compile time. (The test asserts that it parses and finds a type of TArray SomeDynamicChar Nothing.) To represent that the dynamic size, we must represent dimension bound expressions in whatever Dimensions type we use. The F2018 standard names these specification-expr, which are defined as restricted scalar integer expressions.

Dims t a lets us conveniently swap out the dimension bound type, so we can retain simplicity where we only want to support "static" arrays by traverseing with something like a -> Maybe Int. We have a constant expression evaluator which is well suited to handling any Expr-like with little work. The question becomes, how should we represent specification expressions?

We could use an Expr for a... but I wouldn't, because they have plenty of cases inappropriate for a specification expression.
We could define a SpecExpr, which is like Expr but limited. This may require further custom types e.g. for Values. Ideally, we would parse directly such a SpecExpr, but parser complexity has me thinking to go Expr -> Maybe SpecExpr only when needed -- inefficient, but keeps things simple.

Provided we stick with Dims t a, downstream users have a lot of choice for exact representation, e.g. Maybe a where Nothing is a dynamic bound (a more accurate version of the fortran-vars approach) or Maybe (Dims t a) which corresponds almost directly to the fortran-vars approach. It would deserve some usage notes!

raehik commented 1 year ago

Having put some thought into this, I think it's less complex than I originally suggested. We shove the Maybe (Expression a) dimension bounds from the Selector AST type into Dims t a above, like Dims t (Expression ()) (handling the Nothing case with the default lower bound of 1). We provide an evalConstExpr :: MonadEval m => Expression a -> m FValue to allow obtaining a Dims t FValue, provided one is able to attempt to evaluate expressions in their current context. Language.Fortran.Repr provides the relevant functionality.

I can't see how to define a SpecExpr effectively. It might be nice to have a restricted Expression and Value type, but it doesn't give us enough at all for the tedious boilerplate.

raehik commented 1 year ago

Closed by #261 ! fortran-src now exposes a documented module full of general-use types and utilities for handling array dimensions.

camfort / fortran-src

Better represent Fortran array dimension bounds #262