Closed robsimmons closed 10 years ago
I think a functor PrimSequence ( ... ) : PRIM_SEQUENCE
sounds like a fine way to make a significant portion of the array/vector functionality available earlier. I think that you do have SeqIndex.maxInt
, SeqIndex.geu
and SeqIndex.gtu
around after num0.sml
, so you may not need any duplication.
As for defining full operations in terms of slice operations, most of the array/vector operations are small enough to be inlined and slice data structure is eliminated.
Thanks for your response! I was including the new sequence0 after num0 but before SeqIndex got rebound; doing it just a little later gives me what I need, thank you.
So here's something that is a bit confounding for this approach: maxLen' is calculated in sequence.fun to be the minimum of Int.maxInt (if it's a value) and SeqIndex.maxInt'. This is basically reasonable, but now maxLen is being defined before we know Int.maxInt, in a kind of important way (IntInf hasn't been defined yet).
I can't imagine what use having a SeqIndex that was larger than Int would actually serve, so it seems like the right solution is just to say that the maximum length of a sequence is just SeqIndex.maxInt, full stop. However, because the correctness of sequence.fun pervasively depends on this assumption, if I'm going to refactor the code to assume that, it seems like there needs to be a check somewhere that this is internally consistent, and I'm not sure how to go about that. One option is the following, which will cause Overflow to be raised in the basis if the necessary condition isn't satisfied.
local
structure S =
Int_ChooseInt
(type 'a t = 'a
val fInt8 = Int8.fromLarge (SeqIndex.toLarge Prim.maxLen)
val fInt16 = Int16.fromLarge (SeqIndex.toLarge Prim.maxLen)
val fInt32 = Int32.fromLarge (SeqIndex.toLarge Prim.maxLen)
val fInt64 = Int64.fromLarge (SeqIndex.toLarge Prim.maxLen)
val fIntInf = SeqIndex.toLarge Prim.maxLen)
in
val maxLen = S.f
end
Is that a reasonable way to proceed, or are exceptions in the basis due to integer size misconfiguration a no-no?
The current calculation takes place here, by the way. As an aside, Line 104 there should only typecheck if Int = Int32... has the basis actually been compiled with Int as anything else?
Regarding arrays-and-vectors/sequence.fun#L104
: At build/sources.mlb#L88
, the structure Int
does not match the SML Basis Library's signature INTEGER
. At this point, the precision
value is a Int32.int option
.
The type-check-all
target of basis-library/Makefile
type checks the Basis Library implementation under all possible configurations. I've done self-compiles with -default-type int64
and -default-type intinf
.
I think the maxLen'
concern is a non-issue.
One thing to keep in mind is that the Basis Library implementation pervasively uses the following idiom:
structure Foo =
struct
open Foo
(* shadow previous Foo declarations and/or add new declarations *)
end
While this is somewhat of an anti-pattern in SML programming, because it obscures what structure Foo
is at any given point in the program elaboration, it is particularly helpful in the Basis Library implementation because so many of the structures are interdependent upon one another (and it is better than the alternative of one massive implementation structure which defines all of the declarations in use-def order, but without any module-level structuring). Consequently, in the Basis Library implementation, structure Int
is not necessarily the Basis Library specification's structure Int : INTEGER
. Indeed, the Basis Library specification's signature INTEGER
is only defined at build/sources.mlb:L115
and only at build/sources.mlb:L138
is the "default" structure Int
bound with declarations compatible with signature INTEGER
.
Regarding the relationship between structure SeqIndex
and the "default" structure Int
: the SeqIndex.int
type is meant to be a primitive type for indexing into an array or vector. Ultimately, array and vector indexing will be compiled to address arithmetic, so SeqIndex.int
is bound to a type that is efficient for the target platform's address arithmetic; in particular, we want something like Array.foldl
, which has a tight arithmetic loop behind the scenes, to do all of the arithmetic at the target's natural word size. Thus, on a 32-bit platform, structure SeqIndex
follows structure Int32
, while on a 64-bit platform, structure SeqIndex
follows structure Int64
. Consequently, on a 64-bit platform with the default -default-type int32
, structure SeqIndex
is larger than structure Int
. On the other hand, we know that structure SeqIndex
will always be a fixed-sized integer.
In the refactored sequence0.sml
implementation, you shouldn't need anything from structure IntInf
to implement the sequence operations that are all in terms of SeqIndex.int
. Note that when you slice the computing of maxLen'
out of the current calculation, you will have:
local
fun doit (precision, fromInt, maxInt') =
if Primitive.Int32.>= (valOf SeqIndex.precision, precision)
then fromInt maxInt'
else SeqIndex.maxLen'
structure S =
Int_ChooseInt
(type 'a t = SeqIndex.int
val fInt8 = doit (valOf Primitive.Int8.precision,
SeqIndex.schckFromInt8,
Primitive.Int8.maxInt')
val fInt16 = doit (valOf Primitive.Int16.precision,
SeqIndex.schckFromInt16,
Primitive.Int16.maxInt')
val fInt32 = doit (valOf Primitive.Int32.precision,
SeqIndex.schckFromInt32,
Primitive.Int32.maxInt')
val fInt64 = doit (valOf Primitive.Int64.precision,
SeqIndex.schckFromInt64,
Primitive.Int64.maxInt')
val fIntInf = SeqIndex.maxLen')
in
val maxLen' = S.f
end
All of these values/operations will be available after elaborating integer/iwconv0.sml
at build/sources.mlb:L22
.
Then, the SML implementation of structure IntInf
will be built using some structure Vector0 = Sequence0(...)
and you will need to use structure SeqIndex
operations for computing/comparing lengths, computing indices, etc.
Thank you for the type-chec-all
pointer, I needed that. And of course, a 32-bit int
and 64-bit size_t
is entirely reasonable, as I should have seen.
In trying to take this approach, I run into the problem that it's the Int_ChooseInt
functor itself that's not yet available after integer/iwconv0.sml
. I can add config/default/default-$(DEFAULT_INT).sml
to build/sources.mlb#L30-L33
, but because IntInf.int
is not defined yet (even in the current setup it's only defined as Primitive.IntInf.int
), compilation then fails when trying to choose integers to be infinite-precision integers.
You should be able to do something like:
local
local
../config/bind/int-prim.sml
../config/bind/int-inf-prim.sml
in ann "forceUsed" in
../config/default/default-$(DEFAULT_INT).sml
end end
in
../arrays-and-vectors/sequence0.sml
end
All of the code in config/bind
and config/default
is essentially "free" -- they are just conveniences for doing "type = type" and "structure = structure" declarations, which is just namespace management.
We could probably perform fewer rebindings, but you need to carefully use "where type" clauses to propagate type equalities through opaque signature matches. And, with MLton's defunctorization, there is no advantage to having fewer functor declarations or fewer rebindings.
The numeric part of the basis build seems designed with the idea in mind that IntInf should be a perfectly reasonable default integer type. However, that means that vectors are subscripted by integers, and the goal of #60 is to make integers out of vectors. I wanted to discuss how to proceed.
I think the thing to do is to factor out the
SeqIndex.int
parts of theSEQUENCE
signature into aPrimSequence
functor andPRIM_SEQUENCE
signature, and then add primitive arrays and vectors to the build sequence right after num0.sml. There's the additional problem that we don't have maxLen, geu, or gtu around already at this point; as I currently see it that may require some duplicated code.While thinking about implementing this I noticed that a lot of sequence operations are defined in terms of slice operations (sub and unsafeSub were the two that surprised me) - is the optimizer able to factor out the apparent cost of pushing those operations through Slice?