Provide some kind of minimal typechecking, throw useful errors

schanzer commented 2 years ago

Type-checking is hard, and checking tables is harder. So this issue is basically "is there any low-hanging fruit we can pick?"

The Transformer UI doesn't seem to mind if students enter an incorrect Range when they are filling in the Contract section for a number of Transformers. For instance, in a BS lesson we ask students to create and save a transformer which returns true if an animal’s name contains the letter “s.” The contract they ought to provide is Row -> Boolean. Currently, students can put Row -> Number, Row -> String, or Row -> Any, and the correct table is spit out every time (no error message). The only contract that produces an error is Row -> Boundary. (This is just one specific instance; there are others.)

thomascastleman commented 2 years ago

We currently use the CODAP language's type predicates (isBoolean, isNumber, etc) for type checking, so that we are consistent with CODAP (we originally had our own type checking, but made a decision to switch to this).

The implementation of those type predicates on the CODAP side are currently quite lenient, as you've observed. I agree that this behavior is probably not what we want on the Bootstrap side. Should we request that they make them stricter? Move back to our own type checking?

@schanzer @shriram

schanzer commented 2 years ago

HELL YES. CODAP didn't even use those to begin with, right? So if they started out being totally permissive, and added these for our purposes, then let's make sure they are as tight as we need them to be.

shriram commented 2 years ago

YESSSS!!!!! Let's harass them into making CODAP better. (-: (Just kidding, Bill and Jonathan. About the action, that is, not about the desired end result.)

stardust66 commented 2 years ago

Here's how the CODAP type checking currently works.

isBoolean accepts true, false, and strings "true" and "false".
isBoundary accepts boundaries.
isNumber accepts numbers, string numbers, and booleans.
isString accepts anything that is not a boundary. Native CODAP does not have an isString predicate, since all data types besides boundaries are interchangeable with their stringified versions. For example, string numbers like "3" can be used as actual numbers, i.e. "3" + 5 == 8.

The contract they ought to provide is Row -> Boolean. Currently, students can put Row -> Number, Row -> String, or Row -> Any, and the correct table is spit out every time (no error message). The only contract that produces an error is Row -> Boundary. (This is just one specific instance; there are others.)

This probably happens because CODAP counts booleans as numbers, so it passes Row -> Number. In addition, Row -> String passes for all non-boundary types. These are two things we could change.

shriram commented 2 years ago

I am inclined to do the stricter checking. Also file these as bug reports on CODAP. (-:

P-bibs commented 2 years ago

We could hurry this along if we cooked up our own type checkers (we already have them in the commit history somewhere) and opened it as a PR. In that case, we could have whatever we wanted type-checking wise. I think boolean, boundary, and number are quite self explanatory, but string isn't as obvious.

Do we want to allow purely numeric strings (for example, "5")? This would mirror Pyret most closely, and there are certainly cases where you use numerals as categorical data and not as numbers, but this would come at the cost of more false-negatives where students should be using a -> number contract but use -> string instead and no warning is raised.

Another possible solution I can imagine is allowing numeric output from a string contract, but only if there's at least one output that is non-numeric (so ["1", "2", "a"] would be ok but ["1", "2", "3"] wouldn't). This reduces the false-negative rate, but my guess is that outputs like this don't actually appear that often and more likely represent someone trying to mix numeric and categorical data (which we don't want to encourage).

@schanzer @shriram

schanzer commented 2 years ago

I don't have enough experience with type systems to peer into future and guess at what makes more sense here. My gut instinct is always to be more strict. I think the heuristic you propose at the end is a nice compromise, but I would much prefer to hear what @shriram has to say on the matter.

shriram commented 2 years ago

The argument against a custom type-checker is that we will now be incompatible with the rest of CODAP. Indeed, you can have situations where you do the same operation one way (using CODAP drag-and-drop) and you get one result, you do it another way (using /T) and you get a different outcome. I'm not sure that's great. It's unfortunate that CODAP is so loosey-goosey about datatypes (for a "data science" tool, no less), but my feeling is we have to just live with it, and instead nudge the CODAP developers to improve the language, and with it the underlying predicates — but accept that they may not be willing to (too much legacy).

The other option is to introduce new names for some of these things, and use those new names. Then we steer clear of the same name in CODAP. But that may be hard, and furthermore, we would need the CODAP UI to support these new type names (eg, to mark a column as having that type). But that may be a path that leaves legacy CODAP intact while transitioning to a stricter CODAP.

@schanzer What do you think?

schanzer commented 2 years ago

If I knew that the CODAP folks would eventually get where we want them, I'd go with the "new name" strategy. We could deal with the inconsistency in the short term, and then everything works out nicely in the end.

But without knowing that, I'd opt for @P-bibs compromise.

shriram commented 2 years ago

I just talked with Chad Dorsey at Concord. He says revisiting the data model in CODAP is very much on their map for this coming year. What they would really like is that we document the issues we've run into (some of this may already have happened in the summer?). So it seems to make sense for us to just use their predicates, accept that right now they're a bit iffy, but tell them what needs to be fixed (e.g., isBoolean should be strict, and they should probably add an isTruthy and isFalsy—which are useful when dealing with externally-generated data). Numbers should be big on our priority list too.

schanzer commented 2 years ago

This is fantastic news -- in that case, I say document and write the spec for all the type-checking functions you want, and let's cross our fingers that they come soon. :)

Thanks for checking with Chad, @shriram !

P-bibs commented 2 years ago

@shriram Sounds good, we can work on that record-keeping.

Just to clarify my original proposal was not to add our own type-system, but to open a PR on the CODAP repo with improved type-checkers to help bootstrap the process and steer things in the direction we want them. The CODAP folks could then do whatever refinements on those predicates they desire. It sound like this is close enough on their roadmap that we probably don't need to worry about it.

shriram commented 2 years ago

Frankly, if we can nudge them with actual implementations, that would be good for us. (-:

P-bibs / codap-transformers

Provide some kind of minimal typechecking, throw useful errors #227