Closed zenspider closed 7 years ago
Looks like there are a lot of different structures involved. Please provide hints as to the correct syntax so I can parse this stuff.
Looks like there are a lot of different structures involved.
Yeah, this sort of happened a bit at a time, and we weren't sure what the various needs of this data were going to be.
We now have enough data to decide on a file format, but I don't think anyone has gone through and figured out what the syntax should be yet.
@zenspider sounds like you're writing a parser, perhaps you can look through the existing data and tell us what structure we should be using to make parsing convenient. Then we can document it and create issues to update the old data.
@devonestes This is the issue we were talking about on twitter.
I'm just gonna collect my thoughts from #376 here, because I think this needs fleshing out.
I believe we can simultaneously make the JSON easier for humans and programs to read, but the way it is now makes it very hard to make a generalising program.
@petertseng linked to examples of code in various tracks using canonical-data.json
to generate exercises, and I feel they all share a common problem: because each exercise has a different structure, each exercise needs its own separate, different test generator program.
My goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible. I don't want a different ${exercisename}-testgen.factor
for each different JSON structure.
As it is right now, I could theoretically write code to map x-common's JSON keys to my own internal structure, but this requires a duplication across programs that read this data. Also, it's not scalable, and as such it would be genuinely beneficial to everyone to standardise the keys and their meanings.
I am personally willing to manually rewrite all the JSON in this repository to fit a predictable format, but I won't until we have a consensus.
I'd fully support a more generic structure which would make it unnecessary to have a generator for each exercise.
But I have to admit, I have no idea how it could look like. Since you already said you would change them, do you have an idea about the structure already @catb0t?
Also since it seems to be the right time, I want to request a feature for this generic format:
I had a sleepness night, of how I should handle changes in the canonical data as I wanted to have some versioning test. First I thought I'd could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get "invalidated". Therefore I think it would be a good idea to version the canonical data as well.
I'm thinking something like:
- For exercises with one input translating to one output,
description
,input
andoutput
.- For exercises with multiple inputs / multiple outputs,
description
,input_N
,output_N
.Note that it would be disadvantageous to use an array for multiple inputs / outputs where an array is not part of the exercise because it would be hard or impossible to tell the difference between multiple inputs and an actual array. We could have keys like
input_multi
which is an array of inputs, I suppose?
For exercises with multiple inputs / multiple outputs,
description
,input_N
,output_N
.[ ... ] Can we simultaneously make it easy for a human to read as well? [ ... ] in e.g. all-your-base's JSON, [ ... ] many tracks will pass in three inputs:
input_base
,input_digits
,output_base
, and then check that the output digits are as specified inoutput_digits
. If the data then simply looked like"input_1": 2, "input_2": [1], "input_3": 10, "output": [1]
I think it might not be clear what is the difference betweeninput_1
andinput_3
to a human, and I consider this important for being able to understand PRs that propose to change the test cases.
@petertseng makes a good point that input_N
, etc, might harm readability especially since there are no comments in JSON, and I'm not really sure what to do about that.
I don't have a firm idea of what keys would fix Peter's point, which is a reason I haven't started rewriting it all myself yet.
Using descriptive English names makes it hard to access them programmatically, but using numbered keys makes it hard for people (not me, but other maintainers) to read. What strikes a balance?
This might be a little bit wild, so bear with me: what if we add a top-level key metadata
, and it has this structure:
"cases": { "cases data..." }
"metadata": {
"input_keys": [ "input_key1", "input_key2", "input_key3" ],
"output_keys": [ "output_keyN" ]
}
That moves the mapping of human-readable keys from each track's generation code to the JSON itself. Then autogeneration code can read metadata
to get the list of keys that are used in this cases
structure.
[ ... ]How should I handle changes in the canonical data as I wanted to have some versioning test. [ ... ] I could just use the date of the last change, but this would mean, that because of whitespace changes all earlier submissions would get "invalidated". Therefore I think it would be a good idea to version the canonical data as well.
We could perhaps end up with:
"#": "..."
"cases": { "cases data..." }
"metadata": { "..." }
"version": {
"version_hash": "shasum of minified version of this file",
"version_time": "seconds since 1 Jan 1970 here"
}
And you can read the version
key. Or perhaps I'm misunderstanding your point.
I do not understand the input_N
stuff, but there came something into my mind.
{
"exercise": "repeat",
"examples": [
{
"function": "repeat",
"description": "tests valid stuff",
"input_count": 5,
"input_string": "foo",
"expected": "foofoofoofoofoo"
},
{
"function": "repeat",
"description": "tests failure",
"input_count": -5,
"input_string": "foo",
"expected": { "error": "no negatives allowed" }
}
]
}
Perhaps we can use this as a base, or throw it away instantly?
@NobbZ:
{
"function": "repeat",
"description": "tests failure",
"input_count": -5,
"input_string": "foo",
"expected": { "error": "no negatives allowed" }
}
and what ensures the order of the args? There's no metadata in place to declare argument names.
@catb0t:
My goal with exercism.autogen-exercises is to generate all the tests for all the exercises at once which should be trivially possible [emphasis mine]. I don't want a different ${exercisename}-testgen.factor for each different JSON structure.
I don't. I think you can get a good start on it for most languages, but that idea doesn't take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). Nor is it realistic about the level of finality. I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.
I do not see any sense in specifying order of arguments in the canonical testdata. There are different idioms and necessities in the various tracks.
Let's assume we have some data type and we write functions around it. Let's call it list. In object oriented languages it will be the object we call a method in so it will be completely out of the order of arguments. In elixir we like to have this object like argument at the first position to be able to pipe it around, while in Haskell it is preferred to have it last to be able to use point free style and partial application.
So as you can see order of arguments has to be specifies by the tracks maintainer a anyway.
Ryan Davis notifications@github.com schrieb am Mi., 21. Sep. 2016 23:47:
{ "function": "repeat", "description": "tests failure", "input_count": -5, "input_string": "foo", "expected": { "error": "no negatives allowed" } }
and what ensures the order of the args? There's no metadata in place to declare argument names.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/exercism/x-common/issues/336#issuecomment-248752946, or mute the thread https://github.com/notifications/unsubscribe-auth/AADmR8LftAf2rePU2wUD_ZgXKqYCjrzbks5qsaYCgaJpZM4JjxYn .
Maybe I'm a little late and out of topic, but I'll try anyway...
I know that it makes sense in some languages to think about automatically generating tests, but I belive that this is not a goal shared between all tracks.
I think it is impossible, in the general case, to auto-magically generate the test suite, unless we collapse all the types into the ones representable in JSON. I know that, at least in Haskell, that would be bad and wrong! :smile:
That said, it is certainly possible to have a generator to automatically update a specific exercise, if the JSON structure is not changed.
Is it worthy?
That depends on how frequently the data and the structure are updated, but mostly on how fun is the process of writing and maintaining it. So I think it is not unreasonable. :+1:
Alternatively - if the desire is really to have auto-magic test suites - it would be more compatible if the exercises where specified as stdin-stdout mappings. That would be similar to how online judge systems work, but I don't think it is exercism's destiny to follow that path.
Considering that it is generally impossible to automatically generate test suites, I think it doesn't make sense to sacrifice human-readability too much, forging a JSON that is convenient for software but inconvenient for humans.
That doesn't mean we shouldn't standardize the files. We should, but remembering that the files are meant to be read first by humans, and then by software.
Maybe I'm the only one that doesn't got what is going on here, but I think that, until it is clear what is our goal here, we should avoid getting into the details of the specification.
Edit: Ok. I think I got it!. :smile:
What about something like this:
{
"exercise": "cipher",
"version": "0.1.0 or an object with more detailed information",
"comments": [
"Anything you can think of",
"as a list of strings"
],
"tests": [
{
"name": "encode",
"description": "Encodes plaintext",
"cases": [
{
"description": "Encodes simple text",
"plaintext": "Secret message",
"key": "asdf1234",
"expected": "qwertygh"
},
{
"description": "Encodes empty string",
"plaintext": "",
"key": "test1234",
"expected": ""
}
]
},
{
"name": "decode",
"description": "Decodes plaintext",
"cases": [
{
"description": "Decodes simple text",
"ciphertext": "qwertygh",
"key": "asdf1234",
"expected": "Secret message"
},
{
"description": "Decodes empty string",
"ciphertext": "",
"key": "test1234",
"expected": ""
}
]
}
description
and expected
.The descriptions could be mandatory or optional.
I would be possible to use multilevel grouping of tests, but I don't think that is used frequently.
Keeping the description
, inputs and the expected
output together, we have a structure that is more human-friendly, but not so convenient for processing.
@zenspider and @catb0t, would it be too difficult to separate description
and expected
from the other keys? Would it be reasonable for you to use an implicit alphabetic ordering for the remaining keys, instead of adding metadata?
I've been thinking about this a bit recently, and I think the most generalized version of this we can get might be the best for as many different needs as possible. What we're really doing in most of these exercises is basically testing functions. There's input, and there's output. By trying to use keys in our JSON objects that are things like "plaintext" and "key", that's creating a need for knowledge about the exercise to accurately understand how those parts interact.
I think if we can generalize on that concept of a function that we're testing, that might be helpful both for human readability, and also for machine readability so we can possibly use this data for automatic tests.
So, here's my example:
{
"exercise": "cipher",
"version": "0.1.0 or an object with more detailed information",
"comments": [
"Anything you can think of",
"as a list of strings"
],
"tests": [
{
"description": "encodes simple text",
"function": "encode",
"input": ["Secret message", "asdf1234"],
"output": "qwertygh"
},
{
"description": "encodes empty string",
"function": "encode",
"input": ["", "test1234"],
"output": ""
},
{
"description": "decodes simple string",
"function": "decode",
"input": ["qwertygh", "asdf1234"],
"output": "Secret message"
}
]
}
I don't think there are any exercises that require anything other than input and output, but I haven't done too deep of an analysis on that. I'd love any feedback if there are edge cases that would need to be taken care of here. I know that based on the structure above I can think of reasonable ways to parse that and automatically create some skeletons for tests in Ruby, Elixir, Go, JavaScript and Python, but that's really all I can reasonably speak to since those are the only languages I have a decent amount of experience with.
Also, I sort of like the stripped down way of looking at this - when I look at that data I don't need to know the context of the exercise to know what's going on. I just know there's a thing called encode
, and that takes some input and returns some output, and there's a text description of what's going on.
I'm not really 100% sure that this would give us everything we want, but I wanted to at least throw this idea out there to get feedback and see if it might be a starting point for an actually good idea!
What we're really doing in most of these exercises is basically testing functions. There's input, and there's output.
I think that the general case would be to test assertions...
"name": "reversibility",
"description": "Decoding a text encoded with the same key should give the original plaintext",
"cases": [
{
"description": "Only letters",
"plaintext": "ThisIsASecretMessage",
"key": "test1234",
},
... that can be general - like properties, in QuickCheck - or specific, like our common tests.
But I agree that most - if not all - tests are in the form: function inputs == output
.
Also, I sort of like the stripped down way of looking at this - when I look at that data I don't need to know the context of the exercise to know what's going on.
This is probably where I disagree...
Maybe we don't need to know the context, but sometimes we want to.
The ability to group tests is so pervasive that I cannot find a single test framework in Haskell that doesn't allow it:
I just know there's a thing called encode, and that takes some input and returns some output, and there's a text description of what's going on.
Exactly! Substituting the keys by a list of arguments, the only thing we know is that there is something that takes inputs and gives an output. We don't know the meaning of those things anymore!
I understand that your proposal makes automatic generation of tests easier while keeping reasonable readability, @devonestes, but that still comes at a price!
Seems to me that the question that we have to answer is:
@rbasso I see your points, and I actually think we can get a little more of the benefit that you mention. How about something like this:
{
"exercise": "cipher",
"version": "0.1.0 or an object with more detailed information",
"comments": [
"Anything you can think of",
"as a list of strings"
],
"tests": [
{
"description": "encodes simple text",
"function": "encode",
"input": {
"plaintext": "Secret message",
"key": "asdf1234"
},
"output": "qwertygh"
}
]
}
For the interest of programmatically generating tests, we know what our inputs are (and we can easily ignore the human-specific context in the keys in that object and just look at the values), but for the purpose of assigning some meaning to this data, we can give some context-specific information by adding those keys to the input
object.
I think with the above structure we still don't need to understand the context to figure out what's going on, but if we want context it's there for us. I actually think this is a much better version than the original one!
I guess if I were to generalize the structure of a test
object in that JSON, it would be this:
{
"description": "description of what is being tested in this test",
"function": "name of function (or method) being tested",
"input": {
"description of input": "actual input (can be string, int, bool, hash/map, array/list, whatevs)"
},
"output": "output of function being tested with above inputs"
}
So, I actually kind of like that. What does everyone else think?
I especially like the idea of adding the version
and the function
key. I'm currently working on adding test data versioning (which ruby and go already have) and test generation to the Python track, so it would be great if we could agree on a standard format.
The reason I stopped commenting despite the fact that I'm the one who re-kindled this thread is that these replies really disheartened me:
I think you can get a good start on it for most languages, but that idea doesn't take into consideration language call semantic differences (factor/forth vs assembly vs algol-based languages vs keyword arguments (smalltalk, ruby) as an example). ... I think you can easily generate a rough draft for every exercise for a language, but it still needs to be reviewed, finalized, and styled by a human to be a good example to learn from.
...I know that it makes sense in some languages to think about automatically generating tests, but I belive that this is not a goal shared between all tracks. I think it is impossible, in the general case, to auto-magically generate the test suite...
Then what is the goal of this discussion about JSON format at all, if you're not interested in programmatically processing the JSON data to generate the unit tests?
Moreover, I don't see why language-specific differences matter here -- my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too), and since there are already exercise-specific test generators, why not save yourselves the work and write a generic one with better-designed data? (Yes, you should still read and comment the output of the generator for good measure.)
I'm sorry you found my comments disheartening. I just think that your notion: "to generate all the tests for all the exercises at once which should be trivially possible" ignores the fact that you're mechanically generating tests for consumption across a bunch of languages with widely different styles and semantics.
That is going to wind up with "least common denominator" tests. All I was suggesting is that mechanically generated tests will be a good rough draft, but that they should be worked on by humans so that they are good pedagogical examples for each language. To skip out on that is to kinda miss the point of exercism in the first place.
For example, I have found a world of difference in the quality of tests and their ability to help teach me the language and assist me in understanding in rust's tests. Some of them are night and day in difference, and the worst ones were the ones that did a bare minimum "least common denominator" approach.
I'm the author of one of the disheartening comments, @catb0t, so I think I owe some explanations.
First of all, I believe that it is good to standardize the structure of a JSON. I just disagree a little in the goals.
Then what is the goal of this discussion about JSON format at all, if you're not interested in programmatically processing the JSON data to generate the unit tests?
I believe that the JSON data has two complementary goals:
I still disagree about oversimplifying the format to make it easy to automatically generate the tests. This may be extremely valuable in an online judge, because it needs to automatically generate identical tests for a bunch of languages, but it would probably make the exercises less interesting is some languages, as @zenspider already said.
Moreover, I don't see why language-specific differences matter here -- my point was that totally disregarding ALGOL syntax and Ruby keyword arguments and Haskell data types, if everything is just a string you can write a generator to write out tests files (and example files too)...
You are right, if everything is just strings!
But I'm not sure if people here like the idea of having all the exercises as stdin-stdout filters.
Ok, it seems to me like we've all sort of agreed (in our own ways) that this is a rather difficult problem to solve - so how about we try to make this into a couple smaller problems and tackle them individually? 😉
From what I see, we have two distinct goals we're trying to achieve here:
1) Consistency in format allows for easier human readability of the files, which means an easier time understanding and maintaining them.
2) It's possible that if things are consistent enough and we come up with a good enough abstraction, we could programmatically generate the beginnings of test files for some types of language tracks.
Both are indeed noble goals with clear value, and I totally think we should strive to achieve them both - just maybe not at the same time?
Since goal number 2 is clearly really hard, how about we try and get something that's at least solving goal number 1, and then once that's done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what we're trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here.
It seems that this issue is dead for a while...
Let's try to push a little further the idea proposed by @devonestes!
Since goal number 2 is clearly really hard, how about we try and get something that's at least solving goal number 1, and then once that's done we can try and refine it further to accomplish goal number 2? I think limiting the scope of what we're trying to accomplish (with an eye towards the future of course) will be realy helpful in actually getting something shipped here.
I have been playing with the JSON files this week and I have some ideas on how we can extract most of the current test structure without sacrificing readability or enforcing too much.
This will be a really long post, so grab your coffee mug and try not to sleep because I need some feedback here! 😄
Some test suites have tests grouped with labels:
acronym
{
"abbreviate":{
"description":"Abbreviate a phrase",
"cases":[
{
"description":"basic",
"phrase":"Portable Network Graphics",
"expected":"PNG"
}
]
}
}
Grouping tests adds readability to both, the JSON file and the generated tests, so I believe that we should keep this feature somehow.
In the example above, the custom name abbreviate
was used to group and also
identify the type of the tests to be performed. This is an easy solution but
is also a little too restrictive. It would be useful to group distinct
types of tests:
{
"group":{
"description":"Qwerty",
"cases":[
{
"encode":{
"description":"Qwerty encoding",
"plaintext":"Sample plaintext",
"ciphertext":"adsdfsjqwreiugi"
}
},
{
"decode":{
"description":"Qwerty decoding",
"ciphertext":"adsdfsjqwreiugi",
"plaintext":"sampleplaintext"
}
}
]
}
}
We could also have encoded the test types in other ways but what matters here is that moving the test-type specification near the test data, we gained the ability to create heterogeneous test groups!
Decoupling the grouping logic from the test types, we could even nest test groups with varying depths:
{
"group":{
"description":"mathematics",
"tests":[
{
"group":{
"description":"basic math",
"tests":[
{
"addition":{
"description":"simple addition",
"left":1,
"right":2,
"expected":3
}
},
{
"subtraction":{
"description":"simple subtraction",
"left":3,
"right":2,
"expected":1
}
}
]
}
},
{
"division":{
"description":"awesome division by zero",
"left":1,
"right":0,
"expected":"Only Chuck Norris can divide by zero!"
}
}
]
}
}
That may seem unneeded and a little too complex, but it comes almost for free! Also, it is good to have some flexibility for the more complex test suites we may want to create.
A generator could simply ignore all the test grouping and just recursively scan for the tests - flattening the structure - or it could use the grouping information to construct a completely labeled test tree, if the test framework allows it.
The challenge here is to enforce a minimal structure on all tests, without losing any readability or flexibility.
Previous discussions indicate that there is no consensus about encoding input and output, so we should avoid discussing that now and focus on things that will not start a language war.
To allow easy, semi-automatic generation of tests, I think it would be convenient to have at least the following information about a test:
description
- With it the test generators have a textual description
to display in case of success/failure. Also, it allows users and maintainers
to refer to a specific test case in a language-independent way. Tests without
descriptions would leave the users in a situation where they cannot easily
identify where they failed, so it makes sense to enforce their presence.type
- At least implicitly, any test case has a type that identifies
a property being tested, most of the times the name of a test function. What
matters here is that we need a unique identifier for each kind of test in
a test suite, so what we don't end up in a situation where it is impossible
to automatically identify the type of each test case.{
"group":{
"description":"Qwerty",
"cases":[
{
"test":{
"description":"Qwerty encoding",
"plaintext":"Sample plaintext",
"ciphertext":"adsdfsjqwreiugi"
}
},
{
"test":{
"description":"Qwerty decoding",
"ciphertext":"adsdfsjqwreiugi",
"plaintext":"sampleplaintext"
}
}
]
}
}
I see two three options to signal the test type:
This is readable and easy enough to parse, but it doesn't exposes the fact that all the test cases have a description.
{
"decode":{
"description":"Qwerty decoding",
"ciphertext":"adsdfsjqwreiugi",
"plaintext":"sampleplaintext"
}
}
test
keyThis captures more structure but is not so nice to the eyes.
{
"test":{
"description":"Qwerty decoding",
"decode": {
"ciphertext":"adsdfsjqwreiugi",
"plaintext":"sampleplaintext"
}
}
}
Edit: Key-value pair option
This is a little less readable than the first option, but may be interesting for parsing.
{
"test":{
"type":"decode",
"description":"Qwerty decoding",
"ciphertext":"adsdfsjqwreiugi",
"plaintext":"sampleplaintext"
}
}
~~The first option is more pleasant to the eyes and is similar to what we already use, so it makes sense to stick with it unless we find a reason to avoid it.~~
It would be nice to have some arguments in favor or against each of these three alternatives.
I'm still trying to write a schema to allow automatic validation of the
canonical-data.json
files, but I decided that it was already time to
discuss the idea publicly, so that we could improve it together.
Edit: Remember about exercise
, version
and comments
.
Following these ideas, I rewrote exercises/bob/canonical-data.json
to
test the concept in a simple case:
{
"group":{
"description":"bob",
"tests":[
{
"response":{
"description":"stating something",
"input":"Tom-ay-to, tom-aaaah-to.",
"expected":"Whatever."
}
},
{
"response":{
"description":"shouting",
"input":"WATCH OUT!",
"expected":"Whoa, chill out!"
}
},
{
"response":{
"description":"shouting gibberish",
"input":"FCECDFCAAB",
"expected":"Whoa, chill out!"
}
},
{
"response":{
"description":"asking a question",
"input":"Does this cryogenic chamber make me look fat?",
"expected":"Sure."
}
},
{
"response":{
"description":"asking a numeric question",
"input":"You are, what, like 15?",
"expected":"Sure."
}
},
{
"response":{
"description":"asking gibberish",
"input":"fffbbcbeab?",
"expected":"Sure."
}
},
{
"response":{
"description":"talking forcefully",
"input":"Let's go make out behind the gym!",
"expected":"Whatever."
}
},
{
"response":{
"description":"using acronyms in regular speech",
"input":"It's OK if you don't want to go to the DMV.",
"expected":"Whatever."
}
},
{
"response":{
"description":"forceful question",
"input":"WHAT THE HELL WERE YOU THINKING?",
"expected":"Whoa, chill out!"
}
},
{
"response":{
"description":"shouting numbers",
"input":"1, 2, 3 GO!",
"expected":"Whoa, chill out!"
}
},
{
"response":{
"description":"only numbers",
"input":"1, 2, 3",
"expected":"Whatever."
}
},
{
"response":{
"description":"question with only numbers",
"input":"4?",
"expected":"Sure."
}
},
{
"response":{
"description":"shouting with special characters",
"input":"ZOMG THE %^*@#$(*^ ZOMBIES ARE COMING!!11!!1!",
"expected":"Whoa, chill out!"
}
},
{
"response":{
"description":"shouting with no exclamation mark",
"input":"I HATE YOU",
"expected":"Whoa, chill out!"
}
},
{
"response":{
"description":"statement containing question mark",
"input":"Ending with ? means a question.",
"expected":"Whatever."
}
},
{
"response":{
"description":"non-letters with question",
"input":":) ?",
"expected":"Sure."
}
},
{
"response":{
"description":"prattling on",
"input":"Wait! Hang on. Are you going to be OK?",
"expected":"Sure."
}
},
{
"response":{
"description":"silence",
"input":"",
"expected":"Fine. Be that way!"
}
},
{
"response":{
"description":"prolonged silence",
"input":" ",
"expected":"Fine. Be that way!"
}
},
{
"response":{
"description":"alternate silence",
"input":"\t\t\t\t\t\t\t\t\t\t",
"expected":"Fine. Be that way!"
}
},
{
"response":{
"description":"multiple line question",
"input":"\nDoes this cryogenic chamber make me look fat?\nno",
"expected":"Whatever."
}
},
{
"response":{
"description":"starting with whitespace",
"input":" hmmmmmmm...",
"expected":"Whatever."
}
},
{
"response":{
"description":"ending with whitespace",
"input":"Okay if like my spacebar quite a bit? ",
"expected":"Sure."
}
},
{
"response":{
"description":"other whitespace",
"input":"\n\r \t",
"expected":"Fine. Be that way!"
}
},
{
"response":{
"description":"non-question ending with whitespace",
"input":"This is a statement ending with whitespace ",
"expected":"Whatever."
}
}
]
}
}
To check how hard it could be to parse the file, I rewrote the test suite to run the tests directly from the JSON file.
{-# LANGUAGE OverloadedStrings #-}
-- Basic imports
import Control.Applicative ((<|>), liftA2)
import Control.Monad ((>=>))
-- To construct the tests.
import Test.Hspec (Spec, describe, it)
import Test.Hspec.Runner (configFastFail, defaultConfig, hspecWith)
import Test.HUnit (assertEqual)
-- To parse the JSON file.
import Data.Aeson ((.:), eitherDecodeStrict', withArray, withObject)
import Data.Aeson.Types (Parser, Value, parseEither)
import GHC.Exts (toList)
-- To read the JSON file.
import Data.ByteString (readFile)
import Prelude hiding (readFile)
-- The module to be tested.
import Bob (responseFor)
-- Read, decode and run the tests.
main :: IO ()
main = readJSON >>= parseOrError parseJSON >>= runTests
where
readJSON = readFile "test/canonical-data.json"
parseOrError p = either error pure . p
parseJSON = eitherDecodeStrict' >=> parseEither (parseTests parsers)
runTests = hspecWith defaultConfig {configFastFail = True}
-- List of exercise-specific parsers
parsers = [ parseResponse ]
-- | Exercise-specific parser for "response" tests.
parseResponse :: Value -> Parser Spec
parseResponse = withObject "response" $ \o -> do
test <- o .: "response"
description <- test .: "description"
input <- test .: "input"
expected <- test .: "expected"
return $ it description $
assertEqual ("responseFor " ++ show input)
expected
(responseFor input)
-- | Exercise-independent JSON parser.
parseTests :: [Value -> Parser Spec] -> Value -> Parser Spec
parseTests ps = foldr (liftA2 (<|>)) mempty (parseGroup : ps)
where
parseGroup = withObject "group" $ \o -> do
group <- o .: "group"
description <- group .: "description"
tests <- group .: "tests"
specs <- withArray "tests" (traverse (parseTests ps) . toList) tests
return . describe description . sequence_ $ specs
This is still experimental code, so don't take it seriously, but note that only 12 lines of code are exercise-specific. All the others lines are exercise independent!
I avoided any trick to make this easier in Haskell, so the parsing is verbose and feels a little clumsy. Changing the JSON file would make parsing way easier, but that would favor the Haskell track in detriment of other languages and human-readability.
Well, this is all I got for now...
I think that, if we decide to follow this path, in the short term we can expect to:
canonical-data.json
files in Travis-CI.I deliberately avoided specifying inputs and outputs from the tests for a few reasons:
Anyone think it is an useful endeavor to standardize just that for now?
@rbasso Of course this is a useful endeavor! :) I am also working on some test generator for Scala. So let me just add my two cents:
O
for all canonical-data.json
filesI
and generates the exercise's test suiteO
to I
Now the next question could be: Must all of this be 100% language-specific, or how much can be shared and how?
As you can see in the discussion of my PR it seems preferable for some to have the test suite in a separate file instead of immediately using the parse results like you did.
I agree! I just used the tests as a parsing example, to see if the format would be too inconvenient.
Must all of this be 100% language-specific, or how much can be shared and how?
Tell me if you find out the answer. 😄
What other test types might there be?
Example of test type that are not a single function would be the following properties:
I agree that in most of the cases we are testing the return of a function implemented by the user, but it would be nice to use a more general name.
I'll try to rewrite with the following changes:
tests
-> group
testType
-> test
ps: I deleted that post because I was rewriting it with major modifications. sorry.
I hope it is better now!
{
"exercise":"bob",
"version":"1.0.0",
"comments":[
"I am a comment"
],
"group":[
{
"description":"foo",
"group":[
{
"test":"response",
"description":"stating something",
"input":"Tom-ay-to, tom-aaaah-to.",
"expected":"Whatever."
},
{
"test":"response",
"description":"stating the same thing again",
"input":"Tom-ay-to, tom-aaaah-to.",
"expected":"Whatever."
}
]
},
{
"description":"bar",
"group":[
{
"test":"response",
"description":"shouting",
"input":"WATCH OUT!",
"expected":"Whoa, chill out!"
}
]
}
]
}
And here is my first JSON Schema. If anyone has any experience with it, I would love suggestions on how to improve it.
{
"$schema":"http://json-schema.org/draft-04/schema#",
"$ref":"#/definitions/top",
"definitions":{
"comments":{
"type":"array",
"items":{
"type":"string"
},
"minItems":1
},
"description":{
"type":"string"
},
"exercise":{
"type":"string"
},
"group":{
"type":"array",
"items":{
"$ref":"#/definitions/testOrLabeledGroup"
},
"minItems":1
},
"labeledGroup":{
"type":"object",
"required":[
"description",
"group"
],
"properties":{
"description":{
"$ref":"#/definitions/description"
},
"group":{
"$ref":"#/definitions/group"
}
},
"additionalProperties":false
},
"test":{
"type":"object",
"required":[
"test",
"description"
],
"properties":{
"test":{
"$ref":"#/definitions/testType"
},
"description":{
"$ref":"#/definitions/description"
}
}
},
"testOrLabeledGroup":{
"oneOf":[
{
"$ref":"#/definitions/test"
},
{
"$ref":"#/definitions/labeledGroup"
}
]
},
"testType":{
"type":"string"
},
"top":{
"type":"object",
"required":[
"exercise",
"version",
"group"
],
"additionalProperties":false,
"properties":{
"exercise":{
"$ref":"#/definitions/exercise"
},
"version":{
"$ref":"#/definitions/version"
},
"comments":{
"$ref":"#/definitions/comments"
},
"group":{
"$ref":"#/definitions/group"
}
}
},
"version":{
"type":"string"
}
}
}
Finally, after fighting the JSON Schema language for a while, I think I got a proposal that can serve as a starting schema for discussion. I expect it to be:
Here is a sample test file:
{
"exercise":"foobar",
"version":"0.1.0",
"comments":[
"We are",
"comments!"
],
"group":[
{
"foo":{
"description":"foo the void",
"input":"",
"expected":"foo"
}
},
{
"bar":{
"description":"bar the void",
"input":"",
"expected":"bar"
}
},
{
"description":"snafu",
"group":[
{
"foobar":{
"description":"foo and bar",
"input":"...wait for it...",
"expected":"foo...wait for it...bar"
}
}
]
}
]
}
And here is the JSON Schema, formatted in a very unusual way for easier understanding (at least for me):
{
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref" : "#/definitions/canonicalData",
"definitions":{
"canonicalData":
{ "type" : "object"
, "required" : ["exercise" , "version" , "group"]
, "properties":
{ "exercise": { "$ref": "#/definitions/exercise" }
, "version" : { "$ref": "#/definitions/version" }
, "comments": { "$ref": "#/definitions/comments" }
, "group" : { "$ref": "#/definitions/group" }
}
, "additionalProperties": false
},
"exercise": { "type": "string" },
"version" : { "type": "string" },
"comments":
{ "type" : "array"
, "items" : { "type": "string" }
, "minItems": 1
},
"group":
{ "type" : "array"
, "items" : { "$ref": "#/definitions/testItem" }
, "minItems": 1
},
"testItem":
{ "oneOf":
[ { "$ref": "#/definitions/singleTest" }
, { "$ref": "#/definitions/labeledGroup" }
]
},
"singleTest":
{ "type" : "object"
, "minProperties" : 1
, "maxProperties" : 1
, "additionalProperties" : { "$ref": "#/definitions/testData" }
},
"testData":
{ "type" : "object"
, "required" : ["description"]
, "properties":
{ "description": { "$ref": "#/definitions/description" }
}
},
"description": { "type":"string" },
"labeledGroup":
{ "type" : "object"
, "required" : ["description", "group"]
, "properties":
{ "description": { "$ref": "#/definitions/description" }
, "group" : { "$ref": "#/definitions/group" }
}
, "additionalProperties": false
}
}
}
I know this is far from perfect, and some people where expecting a more rigid test schema to allow a fully automated test suite generation. But I believe this is better than nothing.
Also, it is ready to use and seems to work as expected in my preliminary tests:
foobar
test runfoobar-0.1.0
foo the void
bar the void
snafu
foo and bar
Finished in 0.0001 seconds
3 examples, 0 failures
Does anyone have anything to say about it?
Edit: There is also a ported bowling/canonical-data.json
here as an example.
Hello, in case any wonders why the description
is the only required key... In particular, if any wonders why expected
is not a required key:
expected
might not work so well with some exercises.
I give you:
About the schema:
Consider a JSON file following this schema. How easy is it for a parser to determine the difference between a singleTest
and a labeledGroup
? Given an object appearing in a group
array, how will I be able to know which of the two it is? It was not immediately obvious to me, but maybe it is.
consider:
"foo":{
"description":"foo the void",
"input":"",
"expected":"foo"
}
That "foo":
key: what will it be used for? Is it just the description? does that make description
unnecessary?
Given an object appearing in a group array, how will I be able to know which of the two it is? It was not immediately obvious to me, but maybe it is.
Both, the group
and the singleTest
are objects, but we can easily know which is which:
testData
, it is a singleTest
.description
as string
and group
as an array of testItem
, it is a group
.We could solve this problem adding some verbosity to the specification, but I'll discuss that in another message that I'll probably finish writing in a few hours.
That "foo": key: what will it be used for? Is it just the description? does that make description unnecessary?
That foo
, which I informally call "the test type", is fundamental the tell apart different types of tests that could have exactly the same properties, as in this example:
{
"test":{
"description":"Qwerty encoding",
"plaintext":"Sample plaintext",
"ciphertext":"adsdfsjqwreiugi"
}
},
{
"test":{
"description":"Qwerty decoding",
"ciphertext":"adsdfsjqwreiugi",
"plaintext":"sampleplaintext"
}
}
There is no easy way to say which one is a encoding test. I'll write more about the options to solve this in my next message.
Let's say we have an exercises in which the user has to implement two functions:
foo
, receives a string and appends "foo"
to it.bar
, receives a string and appends "bar"
to it.The test suite would normally consist of multiple tests
for foo
and bar
, maybe mixed in a list, so we need a
way to distinguish these two test types:
[ { "description": "How is the codebase?"
, "input" : "fu"
, "expected" : "fubar"
}
, { "description": "A martial art."
, "input" : "Kung-"
, "expected" : "Kung-foo"
}
, { "description": "Where do you live?"
, "input" : ""
, "expected" : "bar"
}
]
Humans can easily see that the first and third tests
appear to simply call the function foo
with input
,
while the second test the function bar
. Let's give
names to these test types:
justFooIt
justBarIt
In my last proposal, we would avoid ambiguity like this:
[ { "justBarIt": { "description": "How is the codebase?"
, "input" : "fu"
, "expected" : "fubar"
}
}
, { "justFooIt": { "description": "A martial art."
, "input" : "Kung-"
, "expected" : "Kung-foo"
}
}
, { "justBarIt": { "description": "Where do you live?"
, "input" : ""
, "expected" : "bar"
}
}
]
This would allow the parser to easily identify each kind of test.
After pondering about it for a while, I think it would be probably better to change to this structure:
[ { "description": "How is the codebase?"
, "justBarIt" : { "input" : "fu"
, "expected": "fubar"
}
}
, { "description": "A martial art."
, "justFooIt" : { "input" : "Kung-"
, "expected": "Kung-foo"
}
}
, { "description": "Where do you live?"
, "justBarIt" : { "input" : ""
, "expected": "bar"
}
}
]
Let's write the full canonical-data.json
for it, so that
we can see how it looks:
{ "exercise": "foobar"
, "version" : "0.1.0"
, "comments":
[ "This is just"
, "an example"
]
, "group":
[ { "description": "How is the codebase?"
, "justBarIt" : { "input" : "fu"
, "expected": "fubar"
}
}
, { "description": "A martial art."
, "justFooIt" : { "input" : "Kung-"
, "expected": "Kung-foo"
}
}
, { "description": "Where do you live?"
, "justBarIt" : { "input" : ""
, "expected": "bar"
}
}
]
With this new structure, I think the JSON Schema would be simpler and at the same time we would be capturing more structure.
I'll try to rewrite the schema with this change as soon as possible.
One question: Should we rename group
to tests
?
I have the feeling that if I had read the example bowling file I would have understood what the foo
(test type) is for, but now it is clear. No objections here. And in fact, doing it this way may fit well with how https://github.com/exercism/x-common/blob/master/exercises/react/canonical-data.json and https://github.com/exercism/x-common/blob/master/exercises/circular-buffer/canonical-data.json operate! Very interesting.
Should we rename
group
totests
?
I could too ask about using the existing name of cases
. However, either cases
or tests
has the following advantage of group
: they answer the question "group of what?" that someone might ask if they just see group
.
I also prefer cases
or tests
over group
! 👍
The only reason I didn't considered cases
before was because I thought it could be misleading when used with groups of tests. tests
sounded more neutral regarding what is inside, while cases
suggests that what is inside are individual test cases.
But that is just a feeling I had. What do you think about it?
cases
{ "exercise": "foobar"
, "version" : "0.1.0"
, "comments":
[ "These is just"
, "a comment."
]
, "cases":
[ { "description": "Appending to non-empty strings"
, "cases":
[ { "description": "How is the codebase?"
, "justBarIt": { "input" : "fu"
, "expected": "fubar"
}
}
, { "description": "A martial art"
, "justFooIt": { "input" : "Kung-"
, "expected": "Kung-foo"
}
}
]
}
, { "description": "Appending to empty strings"
, "cases":
[ { "description": "Where do you live?"
, "justBarIt": { "input" : ""
, "expected": "bar"
}
}
, { "description": "Undescriptive variable name"
, "justFooIt": { "input" : ""
, "expected": "foo"
}
}
]
}
]
}
tests
{ "exercise": "foobar"
, "version" : "0.1.0"
, "comments":
[ "These is just"
, "a comment."
]
, "tests":
[ { "description": "Appending to non-empty strings"
, "tests":
[ { "description": "How is the codebase?"
, "justBarIt": { "input" : "fu"
, "expected": "fubar"
}
}
, { "description": "A martial art"
, "justFooIt": { "input" : "Kung-"
, "expected": "Kung-foo"
}
}
]
}
, { "description": "Appending to empty strings"
, "tests":
[ { "description": "Where do you live?"
, "justBarIt": { "input" : ""
, "expected": "bar"
}
}
, { "description": "Undescriptive variable name"
, "justFooIt": { "input" : ""
, "expected": "foo"
}
}
]
}
]
}
Which one is better?
canonical-data.json files
Changes in this new draft:
version
property, enforcing the format major.minor.patch.exercise
property, enforcing that it is in kebab-case.Questions:
tests
to cases
, as sugested by @petertseng here?comments
in test groups or in tests. Are top level comments enough?description
and tests
? This would allow us to restrict the test types with a regex, but would sacrifice readability.{
"comments":
[ " This is a JSON Schema for 'canonical-data.json' files. "
, " "
, " It enforces just a general structure for all exercises, "
, " without specifying how the test data should be organized "
, " for each type of test. "
, " "
, " There is also no restriction on how to name the 'testData' "
, " objects in 'labeledTestItem' yet, but it is advisable to "
, " follow a reasonable convention: "
, " - 'fooBar' -- lowerCamelCase (used by Google) "
, " - 'FooBar' -- UpperCamelCase "
, " - 'foo-bar' -- kebab-case "
, " - 'foo_bar' -- snake_case "
, " "
, " Because we cannot use negative lookahead in JSON Schema's "
, " regular expressions, it seems very impractical to use a "
, " regex in 'patternProperties' to match a test type name in "
, " 'labeledTestItem' without also maching the strings "
, " 'description' and 'tests'. This prevents us from enforcing "
, " good naming practices automatically. "
],
"$schema": "http://json-schema.org/draft-04/schema#",
"$ref" : "#/definitions/canonicalData",
"definitions":{
"canonicalData":
{ "description": "This is the top-level file structure"
, "type" : "object"
, "required" : ["exercise" , "version" , "tests"]
, "properties" :
{ "exercise": { "$ref": "#/definitions/exercise" }
, "version" : { "$ref": "#/definitions/version" }
, "comments": { "$ref": "#/definitions/comments" }
, "tests" : { "$ref": "#/definitions/testGroup" }
}
, "additionalProperties": false
},
"exercise": { "description": "Exercise's slug (kebab-case)"
, "type" : "string"
, "pattern" : "^[a-z]+(-[a-z]+)*$"
},
"version" :
{ "description" : "Semantic versioning: MAJOR.MINOR.PATCH"
, "type" : "string"
, "pattern" : "^(0|[1-9][0-9]*)(\\.(0|[1-9][0-9]*)){2}$"
},
"comments":
{ "description": "An array of string to fake multi-line comments"
, "type" : "array"
, "items" : { "type": "string" }
, "minItems" : 1
},
"testGroup":
{ "description": "An array of labeled test items"
, "type" : "array"
, "items" : { "$ref": "#/definitions/labeledTestItem" }
, "minItems" : 1
},
"labeledTestItem":
{ "description": "A single test or group of tests with a description"
, "type" : "object"
, "required" : ["description"]
, "properties" :
{ "description": { "$ref": "#/definitions/description" }
, "tests" : { "$ref": "#/definitions/testGroup" }
}
, "additionalProperties": { "$ref": "#/definitions/testData" }
, "minProperties" : 2
, "maxProperties" : 2
},
"description" :
{ "description": "A short, clear, one-line description"
, "type" : "string"
},
"testData":
{ "description": "A free-form object with data for a single test"
, "type" : "object"
}
}
}
Edit: Just wrote an "improved" version that uses negative-lookahead to restrict the test type to camelCase here. I guess it is OK to use some non-mandatory features from JSON Schema.
I tried to revive this issue last week but, except from for a few comments from @abo64, @rpottsoh and @petertseng, it appears that this issue is still not getting much attention since 2016-11-22.
This is mostly my fault, because I cluttered it with huge posts, making it really hard for anyone to catch up with the history. Also, the subject is really technical, and most of the people that where interested in the subject of automatically generating test seems to have gave up on the discussion, which is unfortunate.
There is little hope of standardizing something as important as the canonical-data.json
files without widespread support, as this change would greatly affect all the tracks - hopefully in a positive way - specially the ones using generators.
We have to decide how to proceed here to increase chances of getting something done. Some ideias:
I'll mention @kytrinyx here because this standardization seems kind of central to x-common
's organization. Also, assuming that she is reading this, let me ask:
metadata.yml
data in the canonical-data.json
schema?Whatever the outcome, I'd like to note that the current proposal appears at first glance a lot more complex than any individual canonical data set I've used to build an exercise. It's pretty intimidating as I think about tackling some of those new "add canonical data for this exercise" issues.
I do appreciate the examples when those are provided alongside the specifications though - that makes understanding the spec a lot easier!
Thanks for the feedback, @stkent.
... I'd like to note that the current proposal appears at first glance a lot more complex than any individual canonical data set I've used to build an exercise.
I agree, but most of the complexity comes from what we already have in x-common
, and making the specification simpler would remove some features and make some test suites significantly less documented.
I do appreciate the examples when those are provided alongside the specifications though - that makes understanding the spec a lot easier!
I'm glad you said that. Here is a simpler example of a schema-complaint test suite:
{
"exercise":"foobar",
"version":"0.0.0",
"tests":[
{
"description":"How is the codebase?",
"bar":{
"input" : "fu",
"expected": "fubar"
}
},
{
"description": "A martial art",
"foo":{
"input" : "Kung-",
"expected": "Kung-foo"
}
},
{
"description": "Where do you live?",
"bar":{
"input" : "",
"expected": "bar"
}
},
{
"description": "Undescriptive variable name",
"foo":{
"input" : "",
"expected": "foo"
}
}
]
}
I tried to design the schema to make simple test suites easy to write, while making complex test suites still possible. Of course, there is a significant sacrifice in readability to make the JSON reasonably "parseable" and the schema minimally rational.
I'm afraid this is as simple it gets without loosing the flexibility needed to capture our current test. 😔
I'm afraid this is as simple it gets without loosing the flexibility needed to capture our current test. 😔
It would be possible to collapse the test data with the description, adding a new key to specify the test type. That would remove one nesting level, possibly making it simpler.
I would love to hear what people, specially the ones using test generators, think about it.
{
"description":"How is the codebase?",
"bar":{
"input" : "fu",
"expected": "fubar"
}
}
{
"description":"How is the codebase?",
"type" : "bar",
"input" : "fu",
"expected" : "fubar"
}
I personally like the second one better, as there is only ever one test type, right? Then why have any nesting? Secondly, I also like having the description, type, input and expected values on the same level, as I think a case could be made for them to all be top-level properties (they are equally important).
Thanks for the feedback, @ErikSchierboom.
While I was waiting for comments, I prepared a new proposal using the flatter structure. I think that you and @stkent will prefer this new version (I think I prefer it too).
Changes:
tests
to cases
, as suggested by @petertseng, here.type
.metadata.yml
content.x-common
Changes:
cases
from list of mandatory properties.blurb
to list of mandatory properties.canonical-data.json
without test cases.With these changes, the canonical-data.json
file supersedes metadata.yml
, except for a few atipical files - being discussed in #597 - and the source_url
property, that was renamed sourceUrl
to keep naming consistency.
I'm blindly changing some things here that appear to make sense until I receive more feedback, but this looks like a great opportunity for us to add the properties of metadata.yml
in canonical-data.json
.
Is there any reason for not doing it?
I think combining could make sense, though in that case I'd almost prefer swapping cases
back to tests
since the scope of the file is now larger than "just" tests. I'd obviously defer to @kytrinyx on the combo though, since it will ripple out to other areas of the project.
I think combining could make sense, though in that case I'd almost prefer swapping cases back to tests since the scope of the file is now larger than "just" tests.
Makes perfect sense!
I'd obviously defer to @kytrinyx on the combo though, since it will ripple out to other areas of the project
So let's decide the cases
vs tests
after the decision about incorporating metadata.yml
properties.
Would it be useful to incorporate metadata.yml data in the canonical-data.json schema?
The purpose of the two files is different. One is used to to be able to talk about the exercise, the other is used to be able to produce an implementation. I would hesitate to conflate the two, but am open to discussing it if any of you have strong feelings about it.
@rbasso this is a response to your post regarding Test type in the property key
or Test type in a property value
. I am split on this
particular issue. My initial perception of the first example is that
"thing" bar
is being tested and it is clear to me what is to be its input
and what I should expect to get back from it.
In the second example my initial thought when I see "type"
is that
somewhere is a list of canned types and this instances is for "bar"
,
whatever that is suppose to mean. Why not instead of "type"
could it be
called "testof"
.
I know there has been more discussion on this subject since you made this particular post. Some of these discussions move pretty quickly. 🏃
It appears that all-your-base.json is malformed. Where allergies.json has the structure of:
all-your-base.json has:
cases
should be wrapped in a function name, yes?It appears that bin/jsonlint only checks that the json parses, not that it has good structure.
At the very least, I think this should be patched up and the README expanded to actually show the desired structure. Happy to do a PR for that, assuming I understand it already. 😀