Closed jkyang92 closed 3 years ago
Here's an idea, since the code is already trying to identify "identical" types, what if the typecode database was indexed by a string representing the actual structure of the type instead of the typename. You'd have to be careful with recursive types though, and you might still want to keep track of names for the -typecodes option to work.
That's a good idea. Although, one usually wants a name for a type, so it wouldn't be an inconvenience to have to assign a name to a type.
I thought about it some more, and here's the simplest representation that I can think of for a string that deals correctly with the following two issues.
basic syntax:
To prevent the previously mentioned issues, every struct/array/or type should show up in the string at most once, so if there is a repeated type, use the following syntax:
where the integer indexes which of the previously used types it is.
The main problem with this syntax is that it's complex to construct. This would be a lot simpler if every struct/array/or type was in the database, then we could refer to other types that way instead of having to include the entire structure, but it adds a lot more stuff to the database, most of which doesn't have or need a typecode
Using names would be easier, although I'd mention that as long as typecode database doesn't remember the structure of the type and we try to identify identical types, one can produce a bug where two identical types have different typecodes simply by having them in different files that don't depend on each other. If code from these two files is then used together in a third file, this can cause code to fail at runtime.
I'm beginning to think that identifying two type identifiers that declare the same type structure might be a useless frill that we don't even need - I remember that at the time I was eager to do it simply because it was safe to do it. An interesting experiment would be to modify the code in Macaulay2/c
to disable it and see if the files in Macaulay2/d
still compile. If so, we could add code to the or
type operator to give an error if the two operands are not both or
types or identifiers.
Would you like to try such an experiment? You can tell from the *.sym debugging file whether the two identifiers in a test are declared to be synonyms.
Here's the output of grep -B 3 "value => {TYPE" *.sym
after building the d directory with the -debug flag passed to scc1
They appear to all be arrayint
and arrayintOrNull
from M2.d
A note about the previous data. There's really only two pairs of synonyms, RawArrayInt
with arrayint
and RawArrayIntOrNull
with arrayintOrNull
. Additionally all of those types are untagged, therefore, it should unobservable whether those types are considered the same internally.
A related question is what to do if code were to write Type2 := Type1
. It seems reasonable that a synonym is intended in this case, so it seems that at least in this case the types should be identified. And since the other type has to be declared for this to work, there shouldn't be any way to use this to create multiple identical types with different typecodes. On the other hand, I found no examples of this in the d files, so maybe it's just not worth worrying about.
For those two pairs of synonyms that you found, there might well be code that depends using one in place of the other. And it's not bad to have such synonyms, especially since the engine code likes "Raw" everywhere. Maybe all we have to do is add code to the or
type operator to give an error if the two operands are not both or
types or identifiers, so scc1
doesn't crash.
You also have to be careful in new
, I forgot to show this in the original sample code but, x := new tarray(Z) len 1 do provide Z(1,1);
also fails with the same assertion failure. (this works fine with new array(Z) ...
, because you don't need a typecode in that case). I don't know of any other ways to cause the error, but I think the general thing is we need to check every construct that may use a typecode.
Right! Because "new"s job is to create an instance, and part of the instance is the type code to insert.
Another way to go, which might be better: when encountering something like tarray(Z)
, check to see whether the identical type has been encountered before -- if not, assign a typecode immediately, without waiting for assignment to a name -- if so, identify it with the previous one and use the typecode generated before. This might be more involved, depending on the logic of the code.
The reason this might be better: you could use new tarray(Z) ...
before T := tarray(Z)
safely.
While that certainly would fix this issue, I think it would make it even easier to create conflicting type codes. All you would need now is for two separate d files to call new tarray(Z)
without either including the other. Here's a variant of all of the above that might fix these issues.
tarray(Z)
(We'd have to make sure that the name Z
is unique)A := {+...}
creates a new type regardless of the structure)I think you need the 3rd bullet point to make the second bullet point work, otherwise you can't easily reduce the type of the contents of the array to a simple string.
Something like that might well work. I suppose you also intend a definition of the form A:=B
to simply create a synonym.
As far as what people expect, if the know C, then their expectation might be what you say, because this is invalid C:
struct { int x; } a ;
struct { int x; } b ;
void foo() {
a = b;
}
I wonder whether the code much uses the identical type detection feature. I remember that when I wanted two identical structs to be of different types, I sometimes resorted to adding superfluous void members with various names. Such as here:
export GlobalQuote := {+Operator:Token, rhs:Token, global:void};
export ThreadQuote := {+Operator:Token, rhs:Token, thread:void};
One reason you might need types to be different is so they both are permitted to appear in the same "or" type, as here:
export ParseTree := (
Token or Adjacent or Binary or Unary or Postfix or Parentheses
or EmptyParentheses or IfThen or IfThenElse or StartDictionary
or Quote or GlobalQuote or ThreadQuote or LocalQuote
or TryThenElse or TryElse or Try or Catch or WhileDo or For or WhileList or WhileListDo or Arrow or New or dummy );
If we gave up on isomorphism detection, we could remove those things, and that might make things clearer to the reader.
For the following d file.
Fails to compile with "chk.c:314: assertion failed". Without the
bad
function, the code compiles, although it generates some pretty interesting structs. With theA:=tarray(Z)
the functiongood
(which is identical tobad
) compiles.The most straightforward fix would be to forbidtarray(Z)
innew
, but should it also be forbidden in the definition ofU2
for example? and what about the definitions ofU3
,W2
,W3
,W4
Also that would maketarray
andarray
inconsistent in how they are used withnew
.Edit: I made an actual syntax error in my original code, hiding the fact that
bad2
in fact fails with an assertion failure rather than a syntax error. This makes the question of what the correct fix is less obvious to me. I also added agood2
which is identical tobad2
.