JSON Schema proposal: All inputs should appear under the `input` key

petertseng commented 6 years ago

In our schema, we know that description, expected, and property are fixed, but the inputs to each test case depend on the problem! That means we can't know ahead of time what key(s) constitute the inputs; we have to examine the JSON in order to do so.

According to those test generators listed at https://github.com/exercism/discussions/issues/155 the current solutions to this problem take on one of two forms:

Look for any key that isn't description, expected, property, or comments. That key (or those keys) must be the input, by process of elimination.
Per-exercise configuration or code describing what keys hold the input for each exercise.

Well, what if we could do better?

Let's look at https://github.com/exercism/problem-specifications/blob/master/exercises/change/canonical-data.json where each case has two inputs: coins and target. Let's say that instead a case looks like:

{
  "description": "single coin change",
  "property": "findFewestCoins",
  "input": {
    "coins": [1, 5, 10, 25, 100],
    "target": 25
  },
  "expected": [25]
}

As you can see, the key input contains an object whose keys and values represent the inputs.

And what about https://github.com/exercism/problem-specifications/blob/master/exercises/leap/canonical-data.json where each case has only one input? One might consider two possible ways to do this:

The first way is actually how the JSON is already: The key input contains a scalar, the one and only input.

{
  "description": "year not divisible by 4: common year",
  "property": "leapYear",
  "input": 2015,
  "expected": false
}

The second way we might consider is that it will also be an object, for consistency. So then:

{
  "description": "year not divisible by 4: common year",
  "property": "leapYear",
  "input": {
    "year": 2015
  },
  "expected": false
}

One could consider making that a one-liner: "input": {"year": 2015}) if vertical space savings are desired. This has the advantage of consistency, of course: input always contains an object, but the file also becomes more verbose.

Were one of the two forms of this proposal to be implemented, would this benefit you and/or your track? Would it instead cause harm?

Alternative proposals to solve the problem of "we don't know the names of the keys that hold the input(s) ahead of time"?

petertseng commented 6 years ago

I thought about trying to do this, but my current reaction to "would this benefit you and/or your track" is "meh" since I don't maintain any track that uses generators, and the humans reading the JSON files seem to have no trouble understanding which keys represent the inputs. But hey, it's here if people want it, and maybe it'll be good to have.

Insti commented 6 years ago

Having input as a key with an object containing the well-named arguments is my preference.

ErikSchierboom commented 6 years ago

Were one of the two forms of this proposal to be implemented, would this benefit you and/or your track?

Absolutely! It would make processing the JSON simpler and less error-prone.

Having input as a key with an object containing the well-named arguments is my preference.

Same here.

NobbZ commented 6 years ago

I would be happy about consilidating those keys under input as well!

petertseng commented 6 years ago

I see a comment https://github.com/exercism/problem-specifications/blob/master/canonical-schema.json#L6-L10 in the schema file that explains why input is not a key: To allow for tests that are not example-based (for example, property-based).

I allege that since the vast majority of our specifications are example-based that we can proceed with input. In general, it is hard to see how to encode a property-based test in JSON, and to date we have not made an attempt at doing so.

So I think all we will ask is that this schema does not prevent a future expansion where we might add the ability to encode a property-based test.

I think such an expansion can be added in another key that is not input. So that is achieved.

petertseng commented 6 years ago

If we take action on this (I would wait until a week has passed since the filing of the proposal), plan what to do with the version.

Assume that an exercise that is converted to the new schema is currently versioned x.y.z.

Leave current version scheme in place, which would require a major version bump to (x+1).0.0.
Adopt the two-component version proposed in https://github.com/exercism/problem-specifications/issues/938, which would require a minor version bump to x.(y+1).
Adopt the one-component version proposed in https://github.com/exercism/problem-specifications/issues/938#issuecomment-340226846, which would require simply dropping y and z, resulting in version x.
Remove the version number completely.
Your suggestion here.

Adopting a different version schema simultaneously allows us to take the interesting shortcut of "an exercise is converted to the new input schema if and only if it has an N-component version", so consider it an incentive to adopt a different version schema simultaneously.

My preferences are as I noted earlier. Single-component version > No version at all > Two-component version with CI enforcement that some version component changes ≈ Two-component version without CI enforcement > current version scheme.

coriolinus commented 6 years ago

I think that for exercises who have their canonical-data.json restructured by this, it would be appropriate to leave the current version scheme in place, but increment the version: (x+1).0.0. It's possible that a flexibly-written test generator will generate identical output after the fact, but many (including mine for Rust) will generate something new after the fact. That's a breaking change, appropriate for a major version upgrade.

The proposals to reduce the version number to a two- or one-component number discard clarity for simplicity. I can't deny that they're simpler, but I believe that knowledge of standard three-component semantic versioning is fairly widespread in the programming community at this point.

The proposal to abandon versioning in favor of git SHAs or other schemes have the drawback that it becomes harder for humans to tell what's going on at a glance, and impossible for them to estimate the difference between two files at a glance.

Insti commented 6 years ago

Can we try and keep the versioning discussions in #938 so they will be more easily discoverable to future generations.

petertseng commented 6 years ago

Well, to my surprise, in 24 * 7 hours there were no objection and there was also nonzero support. I suppose we can call it open season to convert JSON files to this, then. https://github.com/exercism/problem-specifications/pull/998 will be able to tell us which exercises were converted or not. I would make a big checklist like https://github.com/exercism/problem-specifications/issues/625 did but since my allotted time is up now, I can't.

snahor commented 6 years ago

I'm late here, but these are all the ~in~valid exercises using @petertseng https://github.com/exercism/problem-specifications/pull/998

[x] acronym
[x] all-your-base
[x] allergies
[x] alphametics
[x] anagram
[x] atbash-cipher
[x] beer-song
[x] binary-search
[x] binary
[x] bob
[x] book-store
[ ] bowling
[x] bracket-push
[x] change
[ ] circular-buffer
[ ] clock
[x] collatz-conjecture
[ ] complex-numbers
[x] connect
[x] crypto-square
[ ] custom-set
[x] diamond
[x] difference-of-squares
[x] dominoes
[x] flatten-array
[x] food-chain
[ ] forth
[x] gigasecond
[ ] grains
[ ] grep
[ ] hamming
[x] hello-world
[ ] house
[x] isbn-verifier
[ ] isogram
[ ] kindergarten-garden
[x] largest-series-product
[x] leap
[ ] list-ops
[ ] luhn
[x] markdown
[x] meetup
[x] minesweeper
[x] nth-prime
[x] nucleotide-count
[ ] ocr-numbers
[ ] palindrome-products
[ ] pangram
[x] pascals-triangle
[x] perfect-numbers
[x] phone-number
[x] pig-latin
[ ] poker
[ ] pov
[ ] prime-factors
[ ] protein-translation
[ ] proverb
[ ] queen-attack
[ ] rail-fence-cipher
[ ] raindrops
[ ] react
[ ] rectangles
[ ] reverse-string
[x] rna-transcription
[ ] robot-simulator
[x] roman-numerals
[x] rotational-cipher
[ ] run-length-encoding
[x] saddle-points
[x] say
[x] scrabble-score
[x] secret-handshake
[x] sieve
[x] space-age
[x] spiral-matrix
[x] sublist
[x] sum-of-multiples
[ ] tournament
[x] transpose
[x] triangle
[ ] trinary
[x] twelve-days
[x] two-bucket
[x] two-fer
[x] variable-length-quantity
[x] word-count
[ ] word-search
[x] wordy
[x] zebra-puzzle
[ ] zipper

petertseng commented 6 years ago

This makes some progress on the concerns listed in https://github.com/exercism/problem-specifications/issues/336 which were:

We don't know which keys are inputs (now we will! They're all the key/value pairs under input!)
Are we able to have meaningful names for the inputs, so that they are easy for humans to read? (We are able!)

In fact, having an input object was proposed there. So this was merely a revival of it, 15 months later. Thanks to those who originally proposed it (https://github.com/exercism/problem-specifications/issues/336#issuecomment-248642924, https://github.com/exercism/problem-specifications/issues/336#issuecomment-249320110, https://github.com/exercism/problem-specifications/issues/336#issuecomment-249447623)

The remaining questions to resolve would probably be:

We don't know what order the inputs come in. (https://github.com/exercism/problem-specifications/issues/336#issuecomment-248752946)

This proposal deliberately chose not to solve that problem because it chooses to take a small step at a time. If you were to predict that I will put forward a proposal to solve that problem, I advise you that that is an extremely unwise prediction to make.

petertseng commented 6 years ago

Everyone is reminded that, when submitting a PR that purports to move inputs to the input object, you will need to remove the USE_OLD_SCHEMA file in that exercise's directory because I just merged #1074. The CI has got your back in case you forget to do that (I mean the CI will fail if you forget).

ErikSchierboom commented 6 years ago

Great work @petertseng!

petertseng commented 6 years ago

Since this proposal was accepted and is now being checked by CI, I believe it's safe to close the issue. Tracking of progress is done by counting the number of USE_OLD_SCHEMA files. even though it's a policy there's no need to keep it open since policies can be found with https://github.com/exercism/problem-specifications/issues?utf8=%E2%9C%93&q=label%3Apolicy

petertseng commented 6 years ago

Note that in react (https://github.com/exercism/problem-specifications/pull/1130) and circular-buffer (https://github.com/exercism/problem-specifications/pull/1186) to accommodate the schema (which requires expected we had to add expected: {} to each case, because the expectations are already encoded in each operation under operations.

If there is desire and demand for it, consider allowing operations to be an alternative to expected. Remember that bank-account will want the same treatment (https://github.com/exercism/problem-specifications/issues/554).

petertseng commented 6 years ago

The remaining questions to resolve would probably be:

We don't know what order the inputs come in. (https://github.com/exercism/problem-specifications/issues/336#issuecomment-248752946)

My advice to the person who will resolve this question: You will need to be able to account for language differences. Take https://github.com/exercism/problem-specifications/blob/master/exercises/accumulate/description.md for example. This is the higher-order function map. Consider how a few existing languages would order the inputs:

Haskell: Object being acted on goes last, so that function composition works well.
- https://wiki.haskell.org/Parameter_order
- Thus, map has the function before the list.
Elixir: Object being acted on goes first, so that pipelining works well.
- https://hexdocs.pm/elixir/Kernel.html#%7C%3E/2
- Thus, Enum.map/2 has the list before the function.

The solution to this problem will need to take into account the possibility that different languages have different conventions.

Whoops, I just typed this up and realized it's already been said before. https://github.com/exercism/problem-specifications/issues/336#issuecomment-248760507. I'm sorry that I failed to add anything new to the discussion.

exercism / problem-specifications

JSON Schema proposal: All inputs should appear under the `input` key #996