canonical-data.json standardisation discussion (was: Malformed data?)

zenspider commented 8 years ago

It appears that all-your-base.json is malformed. Where allergies.json has the structure of:

{
    "allergic_to": {
        "description": [ ... ],
        "cases": [ { "description": "...", ... } ... ]
            }, ...
}

all-your-base.json has:

{
  "#": [ ... ],
  "cases": [ ... ]

cases should be wrapped in a function name, yes?

It appears that bin/jsonlint only checks that the json parses, not that it has good structure.

At the very least, I think this should be patched up and the README expanded to actually show the desired structure. Happy to do a PR for that, assuming I understand it already. 😀

ErikSchierboom commented 7 years ago

I was also slightly confused by the "type" field, but that may be because I also missed the discussion.

rpottsoh commented 7 years ago

@ErikSchierboom I have been trying to keep up, this issue and many others have been a flurry of activity lately. I am confused by "type". Does it mean a class, a method, just a simple stand along function, is it just a variable? If the json file is intended to also be human readable why not keep it that way?

ErikSchierboom commented 7 years ago

@rpottsoh You're asking the wrong person :) I'm also confused.

rpottsoh commented 7 years ago

@ErikSchierboom :smile: More of a rhetorical response, sorry.

ErikSchierboom commented 7 years ago

@rbasso The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.

rbasso commented 7 years ago

So many questions... Great! 😃

About incorporating `metadata.yml`

The purpose of the two files is different. One is used to to be able to talk about the exercise, the other is used to be able to produce an implementation.

I agree!

I would hesitate to conflate the two, but am open to discussing it if any of you have strong feelings about it.

I just imagined that it would be really convenient to pave the way to merge them in a single file describing the exercise, @kytrinyx, as that would allow us to later validate everything together in Travis-CI. I saw it as a low-hanging fruit, so I took it.

But this was just a secondary target. I'll remove it in the next version of the proposal.

The rationale for this proposal

Based on previous proposals, I tried to revive this issue in this message, discussing what would be the minimal structure to capture our current test.

After cluttering this issue with really long messages describing a lot of alternatives I tested, I think we have boiled it down almost to its essence, and so we have to talk about test groups and test types.

About the tests' types and test groups

I was also slightly confused by the "type" field, but that may be because I also missed the discussion.

At least implicitly, @ErikSchierboom , any test case has a type that identifies a property being tested, most of the times the name of a test function.

To algorithmic-ally generate test suites, we need a way to unambiguously
attach a type to each test in a test suite.

I am confused by "type". Does it mean a class, a method, just a simple stand along function, is it just a variable?

The test's type is just a unique identifier - a string - attached to each test, identifying what is being tested. It can be a function, a class, a property or anything else that uniquely identifies the test's type in the test suite.

This should no be confused with the test data. The test's type is a generic reference to something outside the canonical-data.json file, which cannot be encoded in a language-neutral way: the test logic.

Each test type in a test suite is a reference to a specific way of turning test data into a test, @rpottsoh.

If the json file is intended to also be human readable why not keep it that way?

There are a lot of "that way"s in x-common, so I'll have to guess a little in this answer.

We strive for three distinct goals here:

Make it human-readable.
Make it machine-readable.
Make it flexible enough to capture any reasonable test suite.

In most of the test suites with more than one type of test, the test's type is encoded in a property-key, in an object describing a test group.

There are two problem with that approach:

It mixes two different concepts regarding the tests: grouping and identification
It doesn't allow nesting of test groups. That would be nice to have, but is not really needed.
It doesn't allow grouping of test of different types, which would be really great.

Moving the test type near the test data, we solved all the above problem easily. It is theoretically sound and adds functionality.

Test grouping is really important because it helps organizing the test suites and increases human-readability. We already use grouping in a lot of test suites.

The remaining problem was to decide on how to put the test type and the test data together. Two options where recently listed in this message, other ideas where presented in previous messages.

Why not instead of "type" could it be called "testof".

I guess we need something more descriptive than type...

I would like keep the Google JSON Style convention for naming properties, so here are a few options:

type
testOf
testType
property -- This is my favorite for now, just because if fells general enough. (edit)
~~function~~ -- I'm strongly against it because it doesn't apply to some kinds of tests, isn't language-neutral and it is conceptually wrong, IMHO (edit).

Which one is better? Any other suggestion?

About a detailed test data standardization

~~The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.~~

After seeing more than two month of absolute silence in this issue, I decided to aim a little lower, so that we can hope to deliver something. I'm intentionally avoiding any discussion about inputs and outputs.

If we jump back to the discussion about how to encode the test data, I'm sure this schema will not be approved, as we are far from a consensus about it. Also, we could use some time to calmly think about it.

After - if - we approve a general schema for the test suite, I think we can open a new issue to discuss test data standardization. Meanwhile, I think it would be more productive to discuss what would be the best practices in test data encoding, and then we see if that can be standardized.

~~Do you agree with postponing that discussing, @ErikSchierboom?~~

rbasso commented 7 years ago

The updated schema looks great! I was wondering if now is the time to also specify how to handle errors in the schema.

Ops! I just noticed that there seems to be an agreement about a way to encode error messages after reading #551, so I don't see any problem in including the restrictions to the expected property mentioned there. Is this what you had in mind, @ErikSchierboom?

ErikSchierboom commented 7 years ago

@rbasso It is! I think we have something like three options for the expected result of a test:

A concrete value.
A missing value (null/optional).
An error.

Obviously, item 1 is trivial: just put the expected value in the JSON data. For 2, we could agree upon a standard value. I think null would be most suitable. As for three, we could do what #551 suggests and return a special error object.

For your type replacement suggestions, I really like the suggested property name, as I think it is most clear. :+1:

rbasso commented 7 years ago

Ok. I'll write a new version of the proposal so that:

No properties are taken from metadata.yml 😢 . Should we remove the exercise property also?
cases becomes mandatory again. (edit)
cases stays, instead of tests, unless anyone is against it.
type will be renamed to property, as @ErikSchierboom and I agree, and for lack of another name that is sufficiently general and descriptve.
If there is an expected in a test and it has an error, it is the only property and its type is string.

Anything else to remove/add/change?

petertseng commented 7 years ago

three options for the expected result of a test:

A concrete value.

A missing value (null/optional).

An error.

Here's an interesting question I have... do you intend that there will be some property for which expected takes on all three of these types of values? Based on the answer...

If the answer is "yes", you don't necessarily have to give an example of a property, though it can be helpful to try to think of one. I then ask:

How might languages faithfully represent the tests? Sorry but I'm going to pick on a specific language: Would Haskell for example have to use an Either a (Maybe b) or Maybe (Either a b)? What does it mean to have both Either and Maybe?

Or would we have a sum type with three variants?

If the answer is "no", I assume that means every property will only exhibit expected values of either 1+2, or 1+3 (we may also consider a property that should only exhibit 2+3, but let's just talk about 1+2 vs 1+3 for now).

I interpret either combination as "the property may fail to produce a value". I then ask:

Given that these combinations both mean that, how might I choose which one of 1+2 vs 1+3 to use in a given json file? (For example, is it "use missing values when the computation is valid but there might be no answer for the input given, and use errors when the input might be malformed"?)

Edit: ... come to think of it this question is important for the "yes" case as well.

I will, of course, use the answer to the question to guide what happens at https://github.com/exercism/x-common/pull/551#discussion_r101235543

petertseng commented 7 years ago

If the answer is "yes", you don't necessarily have to give an example of a property,

Well, although I do have one that might fit.

Let's consider change: Given some coins and a target, we want to find the way to make that target with the fewest coins.

We could certainly have cases where the input is perfectly valid, but there is no way to reach the target. So let's say that that's represented as null in JSON.

And then we could have cases where the input is obviously invalid, such as a negative target*. So maybe you would say we should call this an error case, with some appropriate representation in JSON (I don't care what, but I used #401 because why not).

Is this an example of what we had in mind? Given just these three cases, is it understandable how to have the tests in a target language:

{ "exercise" : "change"
, "version"  : "0.0.0"
, "comments":
    [ "showing all three possible types of values for `expected`"
    ]
, "cases":
    [ { "description": "Make change"
      , "comments":
          [ "All in one group in this example, but can be split apart later"
          ]
      , "cases":
          [ { "description": "Can make change"
            , "type"       : "change"
            , "coins"      : [1]
            , "target"     : 3
            , "expected"   : [1, 1, 1]
            }
          , { "description": "Can't make change"
            , "type"       : "change"
            , "coins"      : [2]
            , "target"     : 3
            , "expected"   : null
            }
          , { "description": "Negative targets are invalid"
            , "type"       : "change"
            , "coins"      : [1]
            , "target"     : -1
            , "expected"   : {"error": "negative target is invalid"}
            }
          ]
      }
    ]
}

Or, have I missed the mark completely with when to use null versus an error? In which case please correct me.

* = Let's leave aside a ternary coin system where you might have "negative coins", representing the party for whom change is being made giving the coins to the party making the change, rather than the other way around... because in this situation (as well as any others with negative coins), you can reach negative targets.

ErikSchierboom commented 7 years ago

@petertseng Your example is spot on, it is exactly how I expect it to be. And to me, it looks very elegant and is very easy to understand. I like it a lot!

I noticed you still use "type", do you prefer that over "property"?

petertseng commented 7 years ago

Your example is spot on, it is exactly how I expect it to be.

Good to know my rubber-ducking got me onto the right track eventually!

I noticed you still use "type", do you prefer that over "property"?

Oh! No preference =D I just used it in the examples because I used the existing examples with type and didn't want to re-space to property

Property actually seems preferable, since type might get confused with the concept of a type in a type system.

(Okay, fine, slight danger that property will also get confused with properties in OO languages that have them, but it seems to work better than type)

ErikSchierboom commented 7 years ago

Good to hear! I'm really pleased with the result.

zenspider commented 7 years ago

Despite being the OP, I've pretty much checked out of this discussion... Maybe I've missed it as this commentary is nine miles long... I've only skimmed from the bottom, but I still see nothing suggested that provides either a uniform enough interface or enough metadata such that a program can generate tests for any/every exercise.

With this requirement going unaddressed, there's really no point IMHO. My entire goal was to make it easier for a new language to bootstrap up to a publishable state (as I was working on racket at the time, and have stopped because of this exact problem).

If each and every problem needs to be hand written, then the JSON is just a suggestion and isn't actually all that valuable. I want a new language to have to write a simple generator and get 50 problems spat out that only really need stylistic changes before publishing. It's up to the generator author to make it generate idiomatic code that uses the test framework / language well... but there needs to be enough information to actually generate that and I don't see that addressed.

Looking at @petertseng's example from yesterday... I have to write a custom generator. It's unique enough that I might as write the tests by hand.

rbasso commented 7 years ago

Despite being the OP, I've pretty much checked out of this discussion... Maybe I've missed it as this commentary is nine miles long...

True. It is too long. Perhaps I should have opened a new issue with a more detailed title.

I've only skimmed from the bottom, but I still see nothing suggested that provides either a uniform enough interface or enough metadata such that a program can generate tests for any/every exercise.

You are right.

A while ago, in November, we failed this objective for lack of consensus. It was not clear if it was possible and/or desirable to follow the path of completely automated test generation.

Nine days ago, I started to discuss a less ambitious goal: standardizing as much as possible without sacrificing readability and/or flexibility in a significant way. It appear that we have now enough support to follow this path.

With this requirement going unaddressed, there's really no point IMHO. My entire goal was to make it easier for a new language to bootstrap up to a publishable state (as I was working on racket at the time, and have stopped because of this exact problem).

We still want that, and the proposal that we have makes it easier to generate tests but, as you said, it doesn't allow fully automatic test generation.

If each and every problem needs to be hand written, then the JSON is just a suggestion and isn't actually all that valuable.

Yes, the JSON is a suggestion. Tracks can diverge, but try not to.

Test suites need to be partially hand-written, not completely hand-written.

I want a new language to have to write a simple generator and get 50 problems spat out that only really need stylistic changes before publishing. It's up to the generator author to make it generate idiomatic code that uses the test framework / language well... but there needs to be enough information to actually generate that and I don't see that addressed.

I understand that this is not what you wanted, but it can at least reduce the unneeded diversity in structure that we now have in x-common.

By lack of consensus, we are not following that path of encoding the test logic in the canonical-data.json.

The current objective is to write a schema that captures the structure of the test suite. We are avoiding discussing the data representation of the input data and the test logic (how to turn the test data in a test).

Looking at @petertseng's example from yesterday... I have to write a custom generator. It's unique enough that I might as write the tests by hand.

A small piece of code needs to be written for each exercise. Most of if can be factored out, as I exemplified with code somewhere back in this issue.

Think of the current proposal as a partial solution to what you wanted, @zenspider. If in the future people agree that we should follow the path you pointed, we can easily patch the schema to enforce the input data structure. Everything else will be reused!

What I would really like to avoid is finishing this discussing - as in November - without moving a single step further. So I ask...

Can we compromise with a partial standardization?

The way I see the situation now, we gonna get this or nothing...

rbasso commented 7 years ago

JSON Schema for 'canonical-data.json' files

Changes:

cases is mandatory again.
type renamed to property.
No properties taken from metadata.yml anymore.
Add restriction: If there is an expected object and it has an error property, it must be the only property and also be of type string

The schema may need some minor adjustments - cause I probably made a few mistakes - but I don't see how to improve it more without loosing flexiblity and/or readability.

I think we finally got something good enough here, people! 👍

Edit: I also wrote a test suite as a proof-of-concept in Haskell. It is not beautiful code, but it showcases that we in fact don't need much exercise-specific code to parse and create the tests.

ErikSchierboom commented 7 years ago

I completely agree with what @rbasso is saying. Yes, this is "only" a partial solution to a much harder problem, but I feel the partial solution by itself has more than enough merit to warrant going through with it.

With the current suggested format, test generator are definitely possible, although not without some exercise-specific code. I think this is fine, but as said, we can always look to improve this format later to better suit test generation even more.

So all in all, I really like what we have now and I think we should go with that, at least for now.

rbasso commented 7 years ago

If this proposal gets accepted, where should we put the canonical-data-schema.json?

ErikSchierboom commented 7 years ago

Great question. I don't really have a good answer. Perhaps in the root of the exercises directory? Or else in a separate root schema directory?

exercism / problem-specifications