Bad error message running scc1 (chk.c:314: assertion failed)

jkyang92 commented 4 years ago

For the following d file.

Y := {+ a:int, b:int, c:bool};
Z := {+ d:int, e:int};
Z2 := {+ d:int, e:int, f:int};
U := Y or Z;
W2 := {  x:int, av:tarray(Z)};
W3 := {  x:int, a:(Y or Z2)};
W4 := {  x:int, a:{+ aa:Y, bb:Z2}};
U2 := Y or Z or tarray(Z);
U3 := Y or Z or {+ x:int, y:bool };
-- Following code fails to compile with assertion failure in chk.c:314
bad () : U2 := (U2(new tarray(Z) len 1 do provide Z(11,22)));

-- The following code also fails with an assertion failure
-- bad2(e:U2) : U2 := (
--  when e
--     is b:(tarray(Z)) do e 
--     else e
--  );

-- Following code works
-- A := tarray (Z);
-- good () : U2 := (U2(new tarray(Z) len 1 do provide Z(11,22)));
-- good2(e:U2) : U2 := (
--  when e
--    is b:(tarray(Z)) do e 
--    else e
-- );

Fails to compile with "chk.c:314: assertion failed". Without the bad function, the code compiles, although it generates some pretty interesting structs. With the A:=tarray(Z) the function good (which is identical to bad) compiles.

The most straightforward fix would be to forbid tarray(Z) in new, but should it also be forbidden in the definition of U2 for example? and what about the definitions of U3,W2,W3,W4 Also that would make tarray and array inconsistent in how they are used with new.

Edit: I made an actual syntax error in my original code, hiding the fact that bad2 in fact fails with an assertion failure rather than a syntax error. This makes the question of what the correct fix is less obvious to me. I also added a good2 which is identical to bad2.

jkyang92 commented 4 years ago

Here's an idea, since the code is already trying to identify "identical" types, what if the typecode database was indexed by a string representing the actual structure of the type instead of the typename. You'd have to be careful with recursive types though, and you might still want to keep track of names for the -typecodes option to work.

DanGrayson commented 4 years ago

That's a good idea. Although, one usually wants a name for a type, so it wouldn't be an inconvenience to have to assign a name to a type.

jkyang92 commented 4 years ago

I thought about it some more, and here's the simplest representation that I can think of for a string that deals correctly with the following two issues.

Combinatorial explosion
Self-referential/recursive types

basic syntax:

builtin types -> name
structs -> "{...}" or "{+...}"
arrays -> "[...]" or "[+...]"
or -> "..|..|.."

To prevent the previously mentioned issues, every struct/array/or type should show up in the string at most once, so if there is a repeated type, use the following syntax:

repeated type -> "(int)"

where the integer indexes which of the previously used types it is.

The main problem with this syntax is that it's complex to construct. This would be a lot simpler if every struct/array/or type was in the database, then we could refer to other types that way instead of having to include the entire structure, but it adds a lot more stuff to the database, most of which doesn't have or need a typecode

Using names would be easier, although I'd mention that as long as typecode database doesn't remember the structure of the type and we try to identify identical types, one can produce a bug where two identical types have different typecodes simply by having them in different files that don't depend on each other. If code from these two files is then used together in a third file, this can cause code to fail at runtime.

DanGrayson commented 4 years ago

I'm beginning to think that identifying two type identifiers that declare the same type structure might be a useless frill that we don't even need - I remember that at the time I was eager to do it simply because it was safe to do it. An interesting experiment would be to modify the code in Macaulay2/c to disable it and see if the files in Macaulay2/d still compile. If so, we could add code to the or type operator to give an error if the two operands are not both or types or identifiers.

DanGrayson commented 4 years ago

Would you like to try such an experiment? You can tell from the *.sym debugging file whether the two identifiers in a test are declared to be synonyms.

jkyang92 commented 4 years ago

Here's the output of grep -B 3 "value => {TYPE" *.sym after building the d directory with the -debug flag passed to scc1

type_synonyms.txt

They appear to all be arrayint and arrayintOrNull from M2.d

jkyang92 commented 4 years ago

A note about the previous data. There's really only two pairs of synonyms, RawArrayInt with arrayint and RawArrayIntOrNull with arrayintOrNull. Additionally all of those types are untagged, therefore, it should unobservable whether those types are considered the same internally.

A related question is what to do if code were to write Type2 := Type1. It seems reasonable that a synonym is intended in this case, so it seems that at least in this case the types should be identified. And since the other type has to be declared for this to work, there shouldn't be any way to use this to create multiple identical types with different typecodes. On the other hand, I found no examples of this in the d files, so maybe it's just not worth worrying about.

DanGrayson commented 4 years ago

For those two pairs of synonyms that you found, there might well be code that depends using one in place of the other. And it's not bad to have such synonyms, especially since the engine code likes "Raw" everywhere. Maybe all we have to do is add code to the or type operator to give an error if the two operands are not both or types or identifiers, so scc1 doesn't crash.

jkyang92 commented 4 years ago

You also have to be careful in new, I forgot to show this in the original sample code but, x := new tarray(Z) len 1 do provide Z(1,1); also fails with the same assertion failure. (this works fine with new array(Z) ..., because you don't need a typecode in that case). I don't know of any other ways to cause the error, but I think the general thing is we need to check every construct that may use a typecode.

DanGrayson commented 4 years ago

Right! Because "new"s job is to create an instance, and part of the instance is the type code to insert.

Another way to go, which might be better: when encountering something like tarray(Z), check to see whether the identical type has been encountered before -- if not, assign a typecode immediately, without waiting for assignment to a name -- if so, identify it with the previous one and use the typecode generated before. This might be more involved, depending on the logic of the code.

The reason this might be better: you could use new tarray(Z) ... before T := tarray(Z) safely.

jkyang92 commented 4 years ago

While that certainly would fix this issue, I think it would make it even easier to create conflicting type codes. All you would need now is for two separate d files to call new tarray(Z) without either including the other. Here's a variant of all of the above that might fix these issues.

Store tagged structs in the type code database by their name
Store tagged arrays in the type code database as tarray(Z) (We'd have to make sure that the name Z is unique)
Don't identify structs with the same structure with each other (each definition of the form A := {+...} creates a new type regardless of the structure)
Identify identical tarrays with each other (I think most people would expect all tarrays of the same type to be the same)

I think you need the 3rd bullet point to make the second bullet point work, otherwise you can't easily reduce the type of the contents of the array to a simple string.

DanGrayson commented 4 years ago

Something like that might well work. I suppose you also intend a definition of the form A:=B to simply create a synonym.

As far as what people expect, if the know C, then their expectation might be what you say, because this is invalid C:

struct { int x; } a ;
struct { int x; } b ;
void foo() {
     a = b;
     }

I wonder whether the code much uses the identical type detection feature. I remember that when I wanted two identical structs to be of different types, I sometimes resorted to adding superfluous void members with various names. Such as here:

export GlobalQuote := {+Operator:Token, rhs:Token, global:void};
export ThreadQuote := {+Operator:Token, rhs:Token, thread:void};

One reason you might need types to be different is so they both are permitted to appear in the same "or" type, as here:

export ParseTree := (
     Token or Adjacent or Binary or Unary or Postfix or Parentheses 
     or EmptyParentheses or IfThen or IfThenElse or StartDictionary 
     or Quote or GlobalQuote or ThreadQuote or LocalQuote
     or TryThenElse or TryElse or Try or Catch or WhileDo or For or WhileList or WhileListDo or Arrow or New or dummy );

If we gave up on isomorphism detection, we could remove those things, and that might make things clearer to the reader.

Macaulay2 / M2

Bad error message running scc1 (chk.c:314: assertion failed) #1134