Meta: adding new keys to the schema

SleeplessByte commented 5 years ago

I've changed the title to be a meta discussion, so that we don't have the same discussion in multiple issues:

optional flags #1492
immutability flags #1493
other types of tests #1225
multiline "type"
identity field #1473 (to avoid compromising descriptions #1525 to fit this purpose)

It seems there really is a lot of ask for new fields and there also seems to be a notion that we don't want to break the current generators.

Personally, I am currently parsing the comments for special behaviour and this means that my generators will all break when someone decides the comments should be a special format. I can derive no special meaning from the english in these comments. I don't think this is the right place to hold special cases, but it is for things not covered in the current scheme.

I know that you (@pgaspar @ErikSchierboom @coriolinus @sshine @SaschaMann and @NobbZ) are all involved with writing and maintaining the generators, or at least the discussions on GitHub:

I think it would be wise to discuss first if we'll add new keys, if at all, because so far each issue has already raised the "yeah okay but we don't really need it and it costs me work"-type comment
...and then which ones are candidates, so we can merge it at once and the burden on maintainers is much lower, as the raised point just mentioned is valid.

Original issue about multiline.

There is canonical data such as food-chain and twelve-days that correctly comment:

JSON doesn't allow for multi-line strings, so all verses are presented here as arrays of strings. It's up to the test generator to join the lines together with line breaks.

Okay, that's great! But in this case I think it would be great if we'd add a meta property (and keep a list of those, as there are more like these (such as a property preferably being a constant or at least immutable) so that we can automate generation of these exercises.

SaschaMann commented 5 years ago

I think it would be wise to discuss first if we'll add new keys, if at all, because so far each issue has already raised the "yeah okay but we don't really need it and it costs me work"-type comment

Perhaps to address the first point it would be useful to know which tracks have generators that would stop working when solely adding keys (without changing the existing ones). Usually adding non-mandatory fields to JSON is not considered a breaking change in semver, so I'd expect many/most parsers to handle this just fine (though that doesn't mean generators follow that versioning and can handle it!). Doing this could help estimating how much work it would cause.

Wouldn't break:

C#
Dart (according to @Stargator)
Erlang (according to @NobbZ on Slack)
F#
JavaScript
Julia (though the generator is in very early development, so it should probably not be counted here, just adding it for completeness)
Go
Pharo
Ruby
TypeScript

Would break:

Unsure (from #1411):

Bash
ColdFusion
Common Lisp
Factor
OCaml
Perl 6
Rust
Scala
Vimscript

(feel free to edit my comment to move tracks around in the list)

SleeplessByte commented 5 years ago

@SaschaMann Great suggestion!

macta commented 5 years ago

Pharo (Smalltalk) has a reasonably comprehensive (although slightly hacked) generator that wouldn’t break by addition.

I’ve been trying to normalise inconsistent specs where I’ve encountered my generated exercises weren’t great (and am curious how the current generators handled them - allergies and etl were recent ones).

I’d love some extra flags to use as hints.

I currently treat exceptions specially - I have to look for an expected map with a single key of ‘error’ and then generate an exception check instead of an equality one.

I didn’t manage to handle clock very well with its instance equality comparison (I badly generated an isEqual(dict1,dict2) type call against the model instead of: assertEquals(clock1, clock2).

I also name test cases by concatenating subsequent levels of case descriptions - which works ok, but some cases are very long and often repetitive. Fortunately Pharo doesn’t mind 100+ char method names ;)

Some of the input variable names are not ideal , and some conventions around what would be a Boolean sounding property are not great.

I’m 25 exercises in - spot checked ~20 more but the others might well reveal more useful flags.

_{Sent with GitHawk}

SleeplessByte commented 5 years ago

@sshine You've responded in #1225 that you prefer comments because of what @coriolinus , which seems to be "because it's ambitious and might break current generators". Does this mean you have a generator that breaks?

sshine commented 5 years ago

@SleeplessByte: I should have been more clear:

Because property based test frameworks [...] have a programmatic interface.

SleeplessByte commented 5 years ago

👍

@sshine So is that instead of adding new flags, or on top of new flags.

ErikSchierboom commented 5 years ago

I'm a little confused as to the status of this issue. @sshine is it correct to say that you are not in favor of adding new flags/keys to the schema?

kytrinyx commented 5 years ago

I, personally, would be in favor of adding new flags/keys to the schema, as it seems the only way to support the kind of flexibility that we require at the track level (individual tracks want to include/exclude selectively).

While this would potentially break some generators, I am of the firm opinion that generators should be able to handle new keys by ignoring them, and if we're breaking because of that, it's worth fixing.

ErikSchierboom commented 5 years ago

Thanks for chipping in with your thoughts @kytrinyx! If I look at the comments in this issue, it seems like most people don't mind adding new keys to the schema. I'd really like to move forward with this issue, as there are other issues that depend on its resolution.

@sshine you seem to be the person most hesitant to add new fields. Is that correct?

ErikSchierboom commented 5 years ago

As people have had ample time to respond, the general consensus (including @kytrinyx) is that adding new fields is a good thing, when done judiciously.

As such, I believe this issue can be closed. If you feel otherwise, feel free to re-open.

SaschaMann commented 5 years ago

There's a 2nd step in the first comment of the issue, so I think it should stay open:

...and then which ones are candidates, so we can merge it at once and the burden on maintainers is much lower, as the raised point just mentioned is valid.

ErikSchierboom commented 5 years ago

There's a 2nd step in the first comment of the issue, so I think it should stay open:

Ah right! Thanks for catching that.

My preference would be to add the optional flag: https://github.com/exercism/problem-specifications/issues/1492#issuecomment-490058378.

ErikSchierboom commented 5 years ago

Bump. Are there other keys that people want to add to the schema?

macta commented 5 years ago

Is this flags for the whole exercise or individual tests?

For individual tests, an “exception” tag and “equality” tag would disambiguate (the latter - Clock is an example where I believe you want to compare the object and not it’s string output on some tests). Tests with exceptions gives a hint to catch vs equal.

ErikSchierboom commented 5 years ago

tests with exceptions gives a hint to catch vs equal.

I'm not sure I understand this. Could you give an example?

SaschaMann commented 5 years ago

Is this flags for the whole exercise or individual tests?

All the linked issues in the top post are about individual tests, but I don't see a reason why adding something to the exercise as a whole would be different.

Tests with exceptions gives a hint to catch vs equal.

Errors are already marked in a distinct way:

    "expected": {
        "error": "You should never bar a number"
    }

SaschaMann commented 5 years ago

...and then which ones are candidates, so we can merge it at once and the burden on maintainers is much lower, as the raised point just mentioned is valid.

Not sure about the best practices for this case. Is it better to have one branch that all changes are pushed to that will then be merged? Multiple branches that will be merged at the same time to master?

ErikSchierboom commented 5 years ago

Multiple branches that will be merged at the same time to master?

I think this is the preferred way to go.

sshine commented 5 years ago

@ErikSchierboom, @SleeplessByte: Sorry I didn't get back to you, personal matters came in the way.

I think you and @SaschaMann have picked an excellent candidate from the discussion and I look forward to seeing #1518 merged.

Reading back, it seems that the lack of my response was keeping this issue open. If I read @macta's comments, they seem to come with examples where the proposed key of #1518 would be compatible.

ErikSchierboom commented 5 years ago

Sorry I didn't get back to you, personal matters came in the way.

No worries!

I think you and @SaschaMann have picked an excellent candidate from the discussion and I look forward to seeing #1518 merged.

I've added you as a secondary reviewer of #1518. If you don't feel like it, feel free to ignore this.

SleeplessByte commented 5 years ago

Multiple branches that will be merged at the same time to master?

I think this is the preferred way to go.

This was my intention, or at least in close succession.

@ErikSchierboom, @SleeplessByte: Sorry I didn't get back to you, personal matters came in the way.

We all have lives! Don't worry about it. I just wasn't sure if you were in favour, against or something else :)

SaschaMann commented 5 years ago

To address the last suggested key in the first post (it doesn't have its own issue, if it gets too cluttered we can open one, tho):

Okay, that's great! But in this case I think it would be great if we'd add a meta property (and keep a list of those, as there are more like these (such as a property preferably being a constant or at least immutable) so that we can automate generation of these exercises.

I'm wondering if there are any exercises that contain both multiline strings and arrays of strings. If not, I don't think an extra property would be necessary, but I'm not opposed to it if it means that some generators require less adjustments for specific exercises.

SleeplessByte commented 5 years ago

I currently search comments in order to determine if something is a multiline string; because I then need to concatenate the values in the array using \n -- the same exercise also tests for single line strings. So there are not array + multiline string in one exercise, but I do need special handling.

If we don't want to add a property, that's fine :)

ErikSchierboom commented 5 years ago

I currently search comments in order to determine if something is a multiline string;

How many of those exercises are there?

SleeplessByte commented 5 years ago

And then there is beer-song, which doesn't actually have the comment, but should...

cmccandless commented 5 years ago

tournament should also be on that list.

ErikSchierboom commented 5 years ago

Alright, that makes 8 exercises. Maybe an additional keys would be useful then. Any suggestions?

SleeplessByte commented 5 years ago

Tournament is already an issue currently as its comment is distinct from the others.

I don't have great suggestion for the key :p

SaschaMann commented 5 years ago

join-strings or, a bit more verbose, join-string-arrays with boolean value?

ErikSchierboom commented 5 years ago

@SleeplessByte Do you have any suggestions?

macta commented 5 years ago

Another key type you might consider is "constructor input" - but not sure how to easily shoe-horn this in... I'm specifically thinking about problems like "Matrix" - where the data looks like:

{
      "description": "can extract row",
      "property": "row",
      "input": {
        "string": "1 2\n3 4",
        "index": 2
      },
      "expected": [3, 4]
    },

and also a column version

{
      "description": "can extract column",
      "property": "column",
      "input": {
        "string": "1 2 3\n4 5 6\n7 8 9",
        "index": 3
      },
      "expected": [3, 6, 9]
    },

So in fact the string input is unrelated to the row or comumn'ess and the string property is really a "constructor input" vs. "index" which is the "row" or "column" parameter input (like on a method). The combination of property and input is rather vague - some inputs are actually properties of an object or struct, and other properties are more transient - like index (which in most languages is a parameter on a method).

In the examples, its not always clear (without intervention) if both inputs are related to the provided property, or if one of them is actually a property of an object... as sometimes they do relate to the property.

e.g. Do we expect a test like (different languages presented):

new matrix().rowInputAtIndex('1 2\n3 4', 2)
Matrix new rowInput: '1 2\n3 4' index: 2

or

new matrix('1 2\n3 4').rowIndex(2)
(Matrix new: '1 2\n3 4') rowIndex: 2

As it stands, its harder to work out at a more generic level that I'm sure most languages can work at without having to specifically customize generation on an problem by problem basis.

macta commented 5 years ago

I think my comment above is somewhat related to CircularBuffer - which tries to address a similar problem by defining a subset of operations. For matrix above, the row or column specifications are "operations" in circular buffer parlance. Things outside of the operations, are attributes of an object or struct?

ErikSchierboom commented 5 years ago

As it stands, its harder to work out at a more generic level that I'm sure most languages can work at without having to specifically customize generation on an problem by problem basis.

I think this is the key issue here, as many (functional) languages don't usually work with constructors. I'm a bit hesistant to split up the input into two parts: regular input, and constructor input. In the C# generators, I just manually define which properties are to be constructor properties. This works fine, although it is dependent on the property names not changing, but I've not found this to be an issue.

SleeplessByte commented 5 years ago

I agree with @ErikSchierboom -- this is probably something a generator should ask. But perhaps we can be smart about naming?

ErikSchierboom commented 5 years ago

But perhaps we can be smart about naming?

This is definitely an option. It might also be an option to group those "constructor" properties in a separate object. This is supported by the spec.

Getting back to this issue, we now have two proposed new keys:

optional
join-strings/join-string-array

Is that correct?

SleeplessByte commented 5 years ago

join-strings/join-string-array

That's the multiline string thing right?

I think that's correct!

Maybe we should think about things that are interesting across languages which is encoding for string and size for numbers... We've declined quite a few PRs because we didn't want to add these to the problem description in favor of having specific exercises dealing with them, but languages with correct codepoint support or bigint by default would be able to have these test cases.

ErikSchierboom commented 5 years ago

Maybe we should think about things that are interesting across languages which is encoding for string and size for numbers...

We can model these using the optional key: optional: "big-int" and optional: "unicode".

sshine commented 5 years ago

Is there something we need to discuss before ~~#1492~~ #1518 can be merged?

It covers optional: "..." matching the regex ^[a-z]+(-[a-z]+)*$.

ErikSchierboom commented 5 years ago

I think you're referring to https://github.com/exercism/problem-specifications/pull/1518, right? If so, I'm fine with merging that PR and then opening new PR's for any other fields.

SleeplessByte commented 5 years ago

As long as we merge them in close succession (field proposals), we reduce the burden on generator maintainers.

sshine commented 5 years ago

But have I understood correctly that the multi-line problem does not have a proposal yet?

ErikSchierboom commented 5 years ago

But have I understood correctly that the multi-line problem does not have a proposal yet?

I think that is correct.

[...] we reduce the burden on generator maintainers.

Is it that much of a burden though? I assume that most generators use a JSON library that will happily ignore added fields, so adding them one by one does then not impose more maintenance than adding them in rapid succession.

SleeplessByte commented 5 years ago

Is it that much of a burden though? I assume that most generators use a JSON library that will happily ignore added fields, so adding them one by one does then not impose more maintenance than adding them in rapid succession.

Except that we allocate time in our day/week/month to work on this and a context switch is not free, nor cheap. (I agree with both of you on the rest tho :) )

Let's just set a date. I want this all to move forward. It seems we have a few great new additions and proposals. I would like us to check out #1473 and just merge as much as possible in batches.

ErikSchierboom commented 5 years ago

I would like us to check out #1473 and just merge as much as possible in batches.

Fine by me!

exercism / problem-specifications

Meta: adding new keys to the schema #1496

...and then which ones are candidates, so we can merge it at once and the burden on maintainers is much lower, as the raised point just mentioned is valid.