Closed RaoulHC closed 1 year ago
data Dims t a
= DimsExplicitShape
(t (Dim a)) -- ^ list of all dimensions
| DimsAssumedSize
(Maybe (t (Dim a))) -- ^ list of all dimensions except last
a -- ^ lower bound of last dimension
-- | Assumed-shape array dimensions. Here, we only have the lower bound for
-- each dimension, and the rank (via length).
| DimsAssumedShape
(t a) -- ^ list of lower bounds
-- seems same as tuple, written like this for instances and other reasons
data Dim a = Dim a {- ^ lower bound -} a {- ^ upper bound -}
-- simple concrete usage (close to fortran-vars)
type Dimensions = Dims NonEmpty Int
Simpler to use and enables swapping out the list-like (I'm conscious that lazy lists []
aren't ideal for this). Retains the same soundness guarantees if you use a non-empty list-like. Fairly loud pretty forms due to always recording lower bounds; probably fine to leave to the user. Unless we wish to parametrize the fortran-src pretty printer -- which we already do, passing a FortranVersion
around, and I would like to improve it, but it's currently inflexible.
This doesn't support using dummy arguments in arrays like in your third example. That would require a bit more work to ensure variables are properly scoped to reference in array shapes. You also couldn't do much with them unless you know the values of the dummy arguments, and that's something fortran-src (and I think fortran-vars?) don't handle.
We could still support representing them to some degree
Left some little comments but I think in general that makes sense.
Yeah dummy arguments as lower bounds are fine to ignore for now, it would be good to take it into account some point, but they're not common in our codebase.
I see that fortran-vars both supports and tests dummy arguments in upper bounds:
it "Dummy not dynamic" $ do
contents <- flexReadFile path
let st = getSymTable path contents "f6"
typeOf "arr" st `shouldBe` TArray (TCharacter CharLenStar 1) Nothing -- <- this line
isDynamic "arr" st `shouldBe` False
isDummy "arr" st `shouldBe` True
subroutine f6(arr, n, m)
integer n, m
character*(n) arr(m)
end
All non-explicit-shape arrays were previously placed into Nothing
, so it would only be known that the variable is an array and has the given scalar type. These changes intended to remove the Maybe
wrapper in TArray SemType (Maybe Dimensions)
by enabling comfortable encoding of assumed-size, assumed-shape arrays. As is, we'd have to add back the Maybe
to support dummy args without large changes. So we lose our simplicity! :(
Having said that, I'm fairly sure that the above design works if we use a different a
index type:
data DimIdx a = DimIdxConstant a | DimIdxDummy Text
I don't love it currently, as I don't want to complicate fortran-vars' top level representations, and I don't see an immediate way to keep it this hidden. But it would appear to be a solution.
Hmmm, I'll have a ponder to see if I can think of a better solution, but I think adding that DimIdxDummy
would be preferable to keeping the Maybe
. For most transformations we want to do that's probably sufficient information.
I think the alternative would be to have some constant expression representation, but that's a lot of complexity for little gain.
Oh right, my case only covers plain dummy vars, but F2018 (and earlier standards) allow a "restricted scalar integer expression". It wouldn't be wrong to define another "AST" at a more fine grained level which could represent these more accurately. Nor would it be hard to plug into the evaluator, since all the logic exists for handling regular AST types as if they were constant exprs.
I think we could begin some constant expression representation -- call it CExpr
-- and provide a function SymbolTable -> Dims t CExpr -> Either String (Dims t Int)
. Or maybe we can write a SymbolTable -> Dims t CExpr -> Dims t (Maybe Int)
, so a single non-trivial dimension won't mean dropping all other info. We can use this as early as preferred in fortran-vars to avoid exporting a busier representation.
These possible changes and the current merged ones will impact code that depends on fortran-vars. Exactly how depends on what you do with TArray
s. I don't want to create pointless churn, so I'll go over some options when we meet tomorrow (but I'll try to keep the interesting discussion on GitHub for reference).
Yeah that could potentially work, I think converting to Dims t Int
when we can can would be better. I'm not sure how useful Dims t (Maybe Int)
would be to analysis, I think as soon as we can't tell the size of an array, certain analyses aren't possible, and there are certain checks that compilers seem to stop attempting to do in these cases.
Maybe Int
means we can still record some info for dynamic dimensions: in particular we always get type of array (explicit shape, assumed size etc.) and rank. It can always be transformed to Maybe (Dims t Int)
to assert "staticness" (?). Consider this test from fortran-vars
:
https://github.com/camfort/fortran-vars/blob/1f2a6a0e09335ba286180a1becf7c487bbead9d9/test/symbol_table/dummy_argument_symbol.f https://github.com/camfort/fortran-vars/blob/1f2a6a0e09335ba286180a1becf7c487bbead9d9/test/SymbolTableSpec.hs#L334-L353
subroutine sub(stscalar1, starr1, dynscalar1, dynarr1, dynarr2,
+ dynarr3, dynarr4, *)
integer stscalar1
integer starr1(5)
character*(*) dynscalar1
integer dynarr1(*)
integer dynarr2(stscalar1)
integer dynarr3(3,*)
integer dynarr4(3,stscalar1)
integer dynarr5(stscalar1 + 1)
All dynarr*
variables are recorded as TArray _scalar Nothing
. With this replacement Dims t (Maybe Int)
type, we could record dynarr4
as TArray _scalar (DimsExplicitShape (Dim (Just 1) (Just 3) :| [Dim (Just 1) Nothing]))
.
It may mean lots of transforming mid-analysis so as to continue using current style (which handled either static explicit-shape, or gave Nothing
). Such transformation is pure, just shifting data around.
That's my case for Maybe Int
-- but fortran-vars should use whatever type is most appropriate. @RaoulHC may I ask which you would suggest out of
Dims NonEmpty Int
(current)Dims [] Int
(similar to original type synonym [(Int, Int)]
)Dims NonEmpty (Maybe Int)
and we'll stick with it for now (it'll be easier than this to adapt in future!).
Yeah maybe going for Dims NonEmpty (Maybe Int)
with a way of getting a static version is the way of doing it then, there probably are cases where knowing some of the bounds is useful. Perhaps tools built on top should just get the expressions from the AST for dynamic bounds, which would be a bit more involved, but allow us to avoid having the slightly nasty DimIdxDummy
.
I think implementing a fairly trivial Traversable
instance of Dims t
we get the static conversion for free via sequence
.
OK, I'll use that. We have Traversable t => Traversable (Dims t)
along with the similar & requisite Functor
and Foldable
(GHC can derive them all for us).
My DimIdxDummy
and CExpr
ideas were short-sighted, constant expressions aren't a syntactic subset of expressions. Tools building on fortran-src that want to consider arrays with non-constant bounds expressions (mainly dummy vars) should do so using Dims t (Expression ())
. Then the strategy for evaluating bounds expressions is left to the user.
I think fortran-vars concrete type rep wants to be post-eval, so as before we can't represent everything. Constant expressions we can evaluate and store Just bound
; constant expressions using dummy vars we have to say Nothing
.
Implemented in #261 .
Currently
Dimension
in SemanticTypes allows for assumed-size arrays which use*
to denote dynamic dimensions, but not assume-shape ones, which use:
to denote dynamic dimensions. Both of these must be dummy arguments.The former only allows the final dimension to be dynamic, so the following are allowed:
But the following aren't:
However you can specify other dimensions via additional arguments like such:
Which the dimensions type currently can't encode.
Assumed shape ones allow all dimensions to be dynamic, but not only some dimensions, so the following is allowed:
but not these:
Assumed-shape also allows a number of intrinsics that you can't use with assumed-size arrays:
Another difference is that without interfaces assumed-shape arrays behave poorly, giving errors, failing to link, or in the worse case segfaulting at run time.
As a result, as well as being able to encode the type of
arr(:, :)
,Dimensions
should also distinguish betweenarr(:)
andarr(*)
.This page gives a good summary of the different ways of representing arrays: https://fortran-lang.org/en/learn/best_practices/arrays/