Discussion: Getting rid of 'null' and "error" in canonical-data?

Vankog commented 7 years ago

It seems we found quite a deep topic to talk about.

There have been some concerns that show, that the canonical data should not have null and the "expected": "error" structures, because there are quite some inconsistent language concepts in the tracks.

E.g. @NobbZ said in #902:

null input,

In some languages not possible at all, and in my opinion, we should assume some safetynet in the early languages. Also those languages that constantly deal with null-errors should are free to add them as necessary.

I am not aware how many languages do not have a null-concept or something alike

Haskell, Rust, OCaml, Erlang, Elixir, maybe other.

Even those that do have a null value do treat it different in their idioms…

In go it seems as if null is often used synonym with the corresponding zero-value, especially when dealing with arrays, slices or strings.

In other languages I've seen it as "please insert your default value here" when calling a function or the presence of a computation error/argument error when returned from a function.

So in other languages we do not even have strict typing to enforce an input string, shall we therefore do create a canonical test that throws when given an integer when strings are expected?

In some languages we could decide to even use the typesystem of the language to keep out invalid input and create a datatype date Nucleotide = A | C | G | T. Again this would make much of the tests obselete…

Therefore, as I said, the canonical data should only contain a small limited set of test data which deals with correct input and expectations, while handling errornous input should be in the responsibility of the track.

Another reason why it should be in the tracks responsibility are different idioms and possibilities of error signaling. In go we have a multi-return and return errors as a value when they occur or nil if not. In Haskell, Rust, OCaml, Elixir, Erlang we have the habbit of returning Maybe, Either, Result, Some, :ok, and :error tuples or whatnot. In Java and Ruby we throw or raise exceptions (which we do in erlang/elixir as well if we consider an error irrecoverable). In C we return a magic value which is documented (I hope) and also do sometimes even some global state which gives numerical and human readable info about the error.

also @stevejb71 said the same independently here: https://github.com/exercism/problem-specifications/pull/895#discussion_r139301743

Some languages do not have exceptions, or would return an error type for invalid inputs.

So I have a feeling we have to purge these two concepts from the canonical data and from the schema.

I found "error" in the following exercises:

bowling
collatz-conjecture
hamming
nucleotide-count
palindrome-products
perfect-numbers

I found null in the following exercises:

word-search
variable-length-quantity
two-fer
rna-transcription
pov
phone-number
pascals-triangle
forth
flatten-array
binary
alphametics
all-your-base

petertseng commented 7 years ago

Therefore, as I said, the canonical data should only contain a small limited set of test data which deals with correct input and expectations

Okay, so I read this quote as possibly suggesting we do not test null as an input.

I have not read anything in this issue's description that suggests we need to remove the possibility of null output (let me know if I simply missed it).

Some languages do not have exceptions, or would return an error type for invalid inputs.

And so they would. So, they translate the {"error": "whatever"} into however errors are represented in their language. I don't think this necessitates removing errors from canonical data. If we need to just generically describe them as "is an error" rather than saying "throws an exception", "returns an error", "throws an error", then sure I'm fine with documenting that requirement to be generic in descriptions.

Vankog commented 7 years ago

Indeed, there is only the mention of null as input, but not as output or in between. But doesn't this infer nearly the same issues for those languages? How could they expect a null if they don't have a null-concept of any sort?

petertseng commented 7 years ago

How could they expect a null if they don't have a null-concept of any sort?

Using whatever equivalent concept exists in the target language.

If there is a target language that has a problem translating null to their language, can we have a representative step forth and explain?

I can start off with saying that in Haskell we will use the Maybe type, and in Rust we will use the Option type when there are outputs that were represented in JSON as null.

So why don't we accept those types in our inputs as well? Well, why would we? The type system helps us prevent it.

So no problems from either of those languages currently. Let's hear about any experiences with any other languages that have specific problems.

NobbZ commented 7 years ago

How could they expect a null if they don't have a null-concept of any sort?

Using whatever equivalent concept exists in the target language.

But do you really want to only use a Maybe String forever in Haskell just because we decided to always have a null input case in the canonical data when we deal with strings?

We do have the ability in many languages to express that we demand to get a string. If this string can be replaced by some null, nil, nothing in some languages, this is not the problem of the canonical data, since it was meant to be language agnostic.

I can start off with saying that in Haskell we will use the Maybe type, and in Rust we will use the Option type when there are outputs that were represented in JSON as null.

We are talking about null as an input.
null as an output is undesirable, we should use our "error" object here.

petertseng commented 7 years ago

But do you really want to only use a Maybe String forever in Haskell just because we decided to always have a null input case in the canonical data when we deal with strings?

No we don't want to. So we won't. If null appears as an input in JSON, I plan for those languages to simply exclude that case. If we wish to remove all null inputs from JSON files in this repository, I do not object.

null as an output is undesirable, we should use our "error" object here.

If I see null as an output, my interpretation is: We expect that there is no answer to this input, but it is not an error. Historically, I used change as my example. given a coins and a target:

If we can make the target using the provided coins, that is the result.
If the target is negative, that's an error.
If the target is positive but unattainable via the provided coins, it's not an error but there is also no result.

So, does it make sense to in JSON use both null for the third case and {"error": "whatever"} for the second case?

The last time I used this example in https://github.com/exercism/problem-specifications/issues/336#issuecomment-280231149 I was trying to show people that this is insensible. In Haskell, are we expected to have Maybe (Either a b) or Either a (Maybe b) if we might have either null or error object?

So, in JSON I would probably recommend to just use {"error": "some reason"} for all of those cases and in the target language use one of the two monads (which one of the two is not important to this discussion)

Are there no other cases in which we might say "We expect no answer, but it's not an error"? If there are no other cases, sure, let's forbid null as an output too.

NobbZ commented 7 years ago

If the target is positive but unattainable via the provided coins, it's not an error but there is also no result.

I'd had represented it either as an empty result set/list or as a new ADT roughly data CoinResult = Result [Coin] | Impossible | Error String. Maybe even refine the Error stuff.

But maybe I'm biased on this from an erlang/elixir thinking where we do not need to define static types but just use some tagged tuples.

petertseng commented 7 years ago

If the target is positive but unattainable via the provided coins, it's not an error but there is also no result.

I'd had represented it either as an empty result set/list

Reasonable if the function under test returns a list of all possibilities [[Coin]], but if it only returns the optimal [Coin], then [] has the disadvantage of being indistinguishable from "target, being zero, is attainable with no coins".

new ADT roughly data CoinResult = Result ([Coin]) | Impossible | Error String. Maybe even refine the Error stuff.

Reasonable. If considering these three possibilities, would you think the target language's Impossible should be represented as JSON null, or would you still use {"error": "impossible"}? I could be convinced of the latter.

NobbZ commented 7 years ago

null as a value is bare of any semantics, therefore I try to avoid it as hell.

To be honest, I'd even prefer to completely get rid of the JSON data and define canonical data in a haskell-y or ML-y syntax.

ADTs included, which would clearly define types for inputs and outputs in terms of some basetypes. An exercises implementor would then apply the target languages idioms to those types.

During my studies we often used ADTs to “draft” data representation and a function type declaration in haskell style (f :: Int -> Int) to define operations which are possible on that type.

Later on we were able to use those drafts to actually define our system in one of the possible target languages.

But there we all had lectures about haskell already, here I'm not sure if everyone could read and apply it, nor can I assume that everyone is willing to learn…

Disclaimer: I really never liked JSON because of the numeric problem. I never understood why one would use a data exchange format which may loose precision on numerical values depending on the reader…

petertseng commented 7 years ago

Understandable. Until a possible day when we move away from JSON (I don't disagree with the proposal but I will say I can't spend my time to make it), I'm on board with declaring that null must not be an input nor an output. Since I wrote the https://github.com/exercism/problem-specifications/pull/551/commits/1f238c41210af1cde8621d506eaef1adec3c7aaa of #551 I know that that is a place where it needs to be changed.

I wonder if it is possible to have https://github.com/exercism/problem-specifications/blob/master/canonical-schema.json forbid nulls!

Vankog commented 7 years ago

OK, since your arguments are quite too technical for me to really comprehend in a single read, I'll try to summarize them:

You are, in general, pro-getting-rid-of-null in the canonical data, aren't you?
I'm not so sure about your opinion about the "error" statement, but it seems you are in favor of keeping it, right?

NobbZ commented 7 years ago

You are, in general, pro-getting-rid-of-null in the canonical data, aren't you?

@petertseng has convinced me, that having null as an expected return value might be necessary sometimes. At least as long as we stick to JSON.

I'm not so sure about your opinion about the "error" statement, but it seems you are in favor of keeping it, right?

I'm in favor of the error object (no statements in JSON as its not a programming language), but I suggest to leave the interpretation of how this has to be dealt with to the implementing language track, since there are many idioms to deal with errors.

A language might choose to implement those error-objects as…

… exceptions (eg Java where exceptions are rather common instead of beeing exceptional)
… multi-returns with an result and an error value (eg go)
… tagged tuples (eg Erlang, Elixir)
… Result/Either types (or similar shaped types; eg Haskell, ML)
… a value out of the usual co-domain of that function and an accompanying global error state (eg. C)
… or even as a type which causes a compile time error (eg Idirs, Agda)

And even while I'm not in favor of testing invalid input for exercises which could easily avoided by replacing a string with an array of an enum type (most of the DNA exercises), I do see some room for invalid inputs in other exercises where not even haskells typesystem is suitable to safe you, eg: all-your-base (which currently uses null, perhaps we should file an issue about that when the discussion has settled?).

exercism / problem-specifications

Discussion: Getting rid of 'null' and "error" in canonical-data? #905