exercism / problem-specifications

Shared metadata for exercism exercises.
MIT License
327 stars 543 forks source link

Reopening the Problem Specifications repo #1674

Closed ErikSchierboom closed 4 years ago

ErikSchierboom commented 4 years ago

TL;DR; @exercism/track-maintainers We are planning on reopening problem specs in the next couple of weeks. The key change we are making is that problem-specs should be thought of as set of optional test cases. Individual tracks should choose which test cases to implement on a per-test case basis. We are adding a simple, per-exercise configuration file that keeps track of which test cases the exercise implements. We'll provide a simple mechanism for tracks to keep this file up-to-date with the test cases in the exercise's canonical data. The only change required for track maintainers is to update their test-generators to generate tests for just the test cases enabled in the configuration file. All test cases will be immutable; changes can only be introduced by adding new test cases, which can be flagged as re-implementing an existing test case.


A year ago we temporarily put the Problem Specifications repo in bug-fixes only mode (see this issue). This was done as there were systematic problems with the Problem Specifications repo that were causing conflict between maintainers, which we did not have the resources to fix in the immediate term. Now that we're making good progress on Exercism v3, we have been able to dedicate time to designing a solution to the problems in the Problem Specifications repo.

In this issue I will outline the problems that we're trying to solve and the solution we have come up with. We are very excited to re-open the Problem Specifications repo, which has always been one of the most active parts of Exercism and has brought a great deal of joy to many, and how these changes will allow us to do that. Let's get to it! 🎉

Intro to Problem Specifications

The Problem Specifications repo describes a set of language-agnostic exercises, which maintainers can use to build exercises for their tracks. Over time, exercises have grown to not only contain shared titles and descriptions, but also canonical data that describes a set of test cases for the exercise. This setup has been very helpful for maintainers for tasks such as adding new exercises to tracks, updating existing existing exercises as well as bootstrapping entirely new tracks.

As the canonical data is defined as JSON data, some tracks have built tooling to automatically generate test suites from them. This has allowed these tracks to quickly scaffold test suites for new exercises, as well as fix bugs in existing (generated) test suites.

Unfortunately, as different tracks (and different maintainers) have different desires for how their tests are structured and written, this has caused tension. In this document, we'll describe what has caused this tension and how we aim to resolve this tension.

Issues with Problem Specifications

Issues with the Problem Specifications repo occur when there is a 1-to-1 correspondence between an exercise defined in Problem Specifications and a track implementation of that exercise. In other words, if changes to the exercise data in Problem Specifications result in changes to a track's implementation, there can be trouble.

README

The README.md file for track exercises is generated directly from the Problem Specifications exercise's description.md and metadata.yml files, using the configlet tool.

An example of how this can be problematic is the word "promises", with some languages instead using the word "futures" to describe the same concept.

Canonical data

Most of the exercises in the Problem Specifications repo are implemented in multiple tracks. This means that those tracks implement the same exercise using the same canonical data.

As an example for how this can be problematic, consider one track wanting to add a test case that uses Unicode strings as input, which another track would not want to include in their track due to increased complexity for their students. Another example is when one track wants to only have tests for the public interface of the exercise, whereas another tracks wants to have tests for the private or lower-level interface of the exercise.

This issue is compounded even further when the track exercise's test suite is generated directly from a Problem Specifications exercise's canonical-data.json file, as maintainers often are reluctant to do a pre- or post-processing in their generators.

Note that not all types of changes to canonical data are problematic. We can discern the following four types of changes:

Type of change Problematic Example
Add test case Possibly Link
Remove test case Unlikely Link
Update test case description Rarely Link
Update test case input or expected output Possibly Link

As can be seen, not all types of changes are equally problematic. The most problematic changes are when either:

Of course, not all changes are problematic, and many changes are in fact what one could consider "bug fixes", where the expected value does not match what the instructions specify (this could also be due to invalid instructions though). Determining whether something is a mere "bug fix" or something more substantial has proved difficult though, with opinions varying depending on how a track uses a specific test case.

To prevent breaking changes, the canonical data is currently versioned using SemVer, so in theory test generators could use this to do "safe" updates. In practice though, most test generators always use the latest version to generate their exercises.

Redefining Problem Specifications

The first step in solving these issues is defining a clear guiding rule for the Problem Specifications repository:

Problem Specifications exists to cover the "happy path". That means it should work for most tracks most of the time. In situations that aren't in the "happy path", it is down to individual tracks to solve those issues locally.

Application/library tests

Canonical data

Changing test cases

As test cases will be immutable, one cannot change an existing test case and thus a new test case must be added. This has several nice consequences:

Let's look at an example. Suppose we have the following test case:

{
  "uuid": "e46c542b-31fc-4506-bcae-6b62b3268537",
  "description": "two times one is two",
  "property": "twice",
  "input": {
    "number": 1
  },
  "expected": 3
}

We found a mistake in the expected value, but with tests being immutable, we'll have to add a new test case (with a new UUID):

[
  {
    "uuid": "e46c542b-31fc-4506-bcae-6b62b3268537",
    "description": "two times one is two",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 3
  },
  {
    "uuid": "82d32c2e-07b5-42d9-9b1c-19af72bae860",
    "description": "two times one is two",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 2
  }
]

Some additional consequences of this approach:

[
  {
    "uuid": "e46c542b-31fc-4506-bcae-6b62b3268537",
    "description": "two times one is two",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 3
  },
  {
    "uuid": "82d32c2e-07b5-42d9-9b1c-19af72bae860",
    "description": "two times one is two",
    "comments": ["Expected value is changed to 2"],
    "reimplements": "e46c542b-31fc-4506-bcae-6b62b3268537",
    "property": "twice",
    "input": {
      "number": 1
    },
    "expected": 2
  }
]

This change was guided in part us feeling that canonical data test cases actually don't change that often, so it would be a relatively minor inconvenience.

Scenarios

{
  "uuid": "25a8463d-560f-4a42-9be7-b79d00cd28a4",
  "description": "64",
  "property": "square",
  "input": {
    "square": 64
  },
  "expected": 9223372036854775808,
  "scenarios": ["big-integers"]
}

Indicate which test cases have been implemented

[canonical-tests]

# no name given
"19709124-b82e-4e86-a722-9e5c5ebf3952" = true

# a name given
"3451eebd-123f-4256-b667-7b109affce32" = true

# another name given
"653611c6-be9f-4935-ab42-978e25fe9a10" = false

The advantages to having this file are:

  1. It makes it very explicit which test cases are implemented. This is both useful for manual and automatically generated tests.
  2. It is ideal for automation purposes.

We can script-prepopulate the initial creation of these .meta/tests.toml files across all tracks, initially setting all test cases to true.

Notifications

The .meta/tests.toml file allows us to detect which test cases an exercise implements. This gives us the ability to tackle one of the downsides of using the Problem Specifications repository as the source of data: not knowing when canonical data has been changed.

Tracks that implemented an exercise using canonical data as its basis, basically had two options to check if canonical data was changed:

  1. They could keep a close eye on the Problem Specifications repository to see what files were being changed. This requires quite some effort and it is easy to miss things.
  2. If exercises were created by a test generator, a track could re-run the test generator and see if the test suites would change.

Both options are clearly not ideal.

Using the .meta/tests.toml files, we'll be building a GitHub action that serves as a notification bot. It will regularly check the track's .meta/tests.toml files' contents against the test cases defined in Problem Specification repo's canonical-data.json files.

Based on this diff, it could automatically do things like post an issue to your track's repository if there are any relevant changes, which could look like this:

This is your weekly problem-specification update

Summary: There are 20 new tests, and 5 existing tests with updates.

According to your track's config.json file, the following exercises are implemented with problem specification changes:

We could even create individual issues per exercise, but the most important things is that we can do this type of automation when tracks use the .meta/tests.toml file.

Tooling

README

Formatting

Where to go from here?

We’re really excited about re-opening Problem Specifications. We feel like these changes should fix the issues that caused the repo to be locked, but not add any meaningful extra burden on maintainers.

If there is general consensus with this approach we will action these changes and re-open Problem Specifications as soon as possible. I have already prototyped much of the tooling but still have to do some work to get it finished. We also need to add documentation to Problem Specs, and make PRs both to this repo for the UUIDs, and the new files in the track repos. My plan is to submit PR’s for these changes over the next two weeks, and then reopen this repository by mid-October.

Once the above changes have been merged, we can start accepting PR's again (although some of them probably need some little tweaking to correspond to the new format).

Thanks to everyone for being patient with us over this past year! I look forward to hearing your thoughts on all these changes :)

cmcaine commented 4 years ago

Can we just use sequential integers for identifying the test cases rather than UUIDs? It'll save us all a lot of generating UUIDs and copying and pasting them all over the place.

There's a small possibility of more merge conflicts when two PRs are modifying the same tests file, but other than that I don't see a downside if all tests for each exercise will be in one file.

SaschaMann commented 4 years ago

Can we just use sequential integers for identifying the test cases rather than UUIDs? It'll save us all a lot of generating UUIDs and copying and pasting them all over the place.

My preference would be that the IDs are filled in automatically when a case doesn't have one, regardless of what they look like.

cmccandless commented 4 years ago

Can we just use sequential integers for identifying the test cases rather than UUIDs?

There's a small possibility of more merge conflicts when two PRs are modifying the same tests file, but other than that I don't see a downside if all tests for each exercise will be in one file.

The integer approach may be problematic if there is still support for track-specific test cases. How would you number those?

My preference would be that the IDs are filled in automatically when a case doesn't have one, regardless of what they look like.

If UUID is automatically added by tooling, then it's a perfect solution with no additional effort for maintainers (once tooling is in place).

ErikSchierboom commented 4 years ago

If UUID is automatically added by tooling, then it's a perfect solution with no additional effort for maintainers (once tooling is in place).

I think we should be able to do this through a GitHub action, right @SaschaMann?

cmcaine commented 4 years ago

We would still need to copy and paste the UUIDs into the generator code in the tracks and remember to run these actions, which is a minor faff. I'm not dead-set against them, just prefer small integers unless there is a clear need for bigger identifiers.

still support for track-specific test cases.

These live in the track repos, right? If so, I don't see why they would ever need to be disabled or have UUIDs, just comment them out or delete them from their source files because you own them

ErikSchierboom commented 4 years ago

We would still need to copy and paste the UUIDs into the generator code in the tracks and remember to run these actions, which is a minor faff. I'm not dead-set against them, just prefer small integers unless there is a clear need for bigger identifiers.

Actually you don't :) This is where the tooling we'll be releasing comes in. It will give a command-line application that will add/update the exercises .meta/tests.toml in which the UUIDs are specified. Track generators should be updated to use this .meta/tests.toml file to get the UUIDs of the enabled tests, they should not be hard-coding them in the source code.

cmcaine commented 4 years ago

Ah, I missed the significance of that. Thanks!

verdammelt commented 4 years ago

I like this plan. I think it is worth a try.

One question/comment: you say that existing tracks will be updated with TOML file that has all canonical exercises set to true. Wouldn't it be probably more correct to have that default to false? Either way will be wrong - but it seems the latter would be more correct, more often.

Of course when you put in a PR for these files on my track I can just sed -e 's/true/false' so not a big issue... just curious why you are choosing true.

NobbZ commented 4 years ago

If a track's exercise is based on canonical data from the Problem Specifications repo, the exercise should contain a .meta/tests.toml file.

We are using JSON everywhere, why introduce another config format?

SleeplessByte commented 4 years ago

🍆

(In other words: I'm happy with this)

iHiD commented 4 years ago

just curious why you are choosing true

Based on the assumption that currently most tracks read directly from the canonical-data.toml atm so they're effectively implicitly working from all tests. So it seemed that true would most likely conform to the existing situation, making a regen of the tests per track effectively a no-op.

NobbZ commented 4 years ago

Wait? Now you add a YAML file to the mix?

Why JSON + YAML + TOML? Can't we agree on one of those?

Preferably JSON, as it probably has the most parsers available across languages?

coriolinus commented 4 years ago

Why JSON + YAML + TOML? Can't we agree on one of those?

The nature of an evolved design.

Preferably JSON, as it probably has the most parsers available across languages?

I want to [Citation Needed] this, but you're probably right. Even so, if we do choose to standardize, I'd pick TOML over JSON any day:

It's just a nicer format to work with.

coriolinus commented 4 years ago

... Looks like I never actually posted my initial responses. Oops.

  • There is no longer any discussion whether a change is a patch, minor or major update.
  • We no longer need the versioning of the canonical data.

I very much like the idea of getting rid of the test semver entirely. That's been the cause of a lot of extraneous human effort in the past.


  • Exercises must contain tests that cover the public interface of the exercise (also thought of as "application tests").
  • Exercises may contain tests that cover the private or lower-level interface of the exercise (sometimes refered to as "library tests").

The sync command can [manipulate] .meta/tests.toml ...

I'd suggest that in cases where exercises contain both application and library tests, the tests should be split into distinct test groups. Further, the tooling should be smart enough to give the maintainer the option to approve or reject an entire test group at a time, based on its description. The result of these policies in combination is that maintainers who do not want library tests in their suite can reject an arbitrary number of current and future library tests with a single flag.

The implication is that test groups must also have identifiers (probably UUIDs), and the tests.toml specification file must be able to include/exclude/leave unspecified entire groups at a time.

Alternatively, this feels like something which could be handled with scenarios.


I kind of like the new scenarios mechanism, but it feels very redundant with the existing test grouping strategy. I'd suggest that we choose one of two options here:

If we do keep the concept of scenarios, then they should have an identifier, probably UUID, which gets tracked in tests.toml so that, bare minimum, a maintainer can reject an entire scenario in a single operation.


Can we just use sequential integers for identifying the test cases rather than UUIDs?

The major problem I see with this is when two edits to the canonical data for a particular exercise come in simultaneously: sequential integers are very likely to cause merge conflicts, UUIDs are not.

iHiD commented 4 years ago

@NobbZ Was a typo. I've edited my post.

SaschaMann commented 4 years ago

I want to [Citation Needed] this, but you're probably right. Even so, if we do choose to standardize, I'd pick TOML over JSON any day

I agree. JSON is a horrible format for manual writing/editing. I'd also argue it's less familiar to people than TOML because you frequently encounter ini files, even as a non-programmer, when using a computer and TOML is quite similar to them. JSON is rather web-dev specific. I'd be happy to switch to TOML for the human-edited canonical data and config files, and auto-convert them to JSON for backwards compatibility for the site/generators. Or switch to something like JSON5.


I kind of like the new scenarios mechanism, but it feels very redundant with the existing test grouping strategy.

A made-up counterexample: Say I have a group of tests that test string concat. It would have tests like "Hello, " * "World!" == "Hello, World!", "foo" * "bar" == "foobar". Now a test involving unicode, e.g. "🐶" * "🐱" == "🐶🐱" would belong in that group. It tests the exact same behaviour, concatenation of two strings. But, as a compatibility feature for less-unicode-savy languages, it could be tagged with the scenario unicode. In this example the grouping would be used for the abstract behaviour, while the scenario would be used for highlighting implementation details. There are several exercises that have cases like this.

If we do keep the concept of scenarios, then they should have an identifier, probably UUID, which gets tracked in tests.toml so that, bare minimum, a maintainer can reject an entire scenario in a single operation.

What benefit does a UUID have over a string identifier in this case?

SaschaMann commented 4 years ago

Some more questions:

Is the group of a test considered metadata and thus mutable?


How does JSON handle multiple keys with the same name? Since there are now quite a few things that could benefit from or require comments, e.g. scenarios and reimplements, it would be useful to comment directly above or below it. Can we just add multiple "comments": ... entries to the same object?


The scenarios field can use one or more of a predefined set of values, which are defined in a SCENARIOS.txt file.

Will this be a repo-wide file, or an exercise-specific file?

coriolinus commented 4 years ago

@SaschaMann A made-up counterexample: ...

There's nothing preventing multiple scenarios from being applied to a single test case; in that way, they're a generalization of grouping, which requires a strict tree structure. In your example, I'd expect to see "scenarios": ["string-concat", "unicode"],. A unicode-averse track could then remove that test case without any human intervention if they had already banned the "unicode" scenario, even if it's also in the "string-concat" scenario.

As I see it, the main benefit of the current nesting structure as opposed to scenarios is that it's more intuitive; the new structure will want to be somewhat more machine-mediated.

What benefit does a UUID have over a string identifier in this case?

SaschaMann commented 4 years ago

As I see it, the main benefit of the current nesting structure as opposed to scenarios is that it's more intuitive; the new structure will want to be somewhat more machine-mediated.

There's still a difference because in many cases, the group structure translates to the nesting of the tests in the implementation of the exercise. With "scenarios": ["string-concat", "unicode"] it's unclear how they should be nested.

ErikSchierboom commented 4 years ago

We are using JSON everywhere, why introduce another config format?

Basically: ease of editing, less verbose and the ability to add comments. We considered using JSON, but chose TOML for the aforementioned reasons. @SaschaMann and @coriolinus have described these reasons well.

ErikSchierboom commented 4 years ago

I very much like the idea of getting rid of the test semver entirely. That's been the cause of a lot of extraneous human effort in the past.

Definitely.

ErikSchierboom commented 4 years ago

The scenarios field can use one or more of a predefined set of values, which are defined in a SCENARIOS.txt file.

Will this be a repo-wide file, or an exercise-specific file?

It will be repo-wide. We'll rename the existing OPTIONAL-KEYS.txt document for that.

How does JSON handle multiple keys with the same name? Since there are now quite a few things that could benefit from or require comments, e.g. scenarios and reimplements, it would be useful to comment directly above or below it. Can we just add multiple "comments": ... entries to the same object?

In principle, it is allowed, but in practice I've never seen people using it. There is also a considerable risk that existing JSON parsers will break (see https://stackoverflow.com/a/23195243/2071395), so I'd argue against using multiple comments keys and just using putting all information in one "comments" field. Side note: did you already know this @SaschaMann and is this a clever ploy that allows you to show the benefits of TOML? 🤣

SaschaMann commented 4 years ago

In principle, it is allowed, but in practice I've never seen people using it. There is also a considerable risk that existing JSON parsers will break (see https://stackoverflow.com/a/23195243/2071395), so I'd argue against using multiple comments keys and just using putting all information in one "comments" field. Side note: did you already know this @SaschaMann and is this a clever ploy that allows you to show the benefits of TOML? 🤣

No I didn't, it was just a nice coincidence to find yet another reason why json is a poorly suited choice :D

so I'd argue against using multiple comments keys and just using putting all information in one "comments" field.

That sounds like it can get incredibly messy, especially with the convention of super short lines and splitting the comments up into an array. Maybe as a bodge we could define all fields starting with _ as comment? Or alternatively, get rid of the before mentioned convention and let editors handle the line wrapping. Then one array entry can correspond to one comment "section".

Some exercises like complex-numbers already have messy comments, this will just make it worse.

Or as I suggested earlier, use a format that supports comments and convert to json for backwards compat with generators :)

ErikSchierboom commented 4 years ago

That sounds like it can get incredibly messy

Maybe we should see how this pans out first? It is possible that it won't be that messy. And if it will be, we can always change this up later. I'm not saying I don't think your argument is valid, just that I'm not entirely sure we have to take this on right now.

ErikSchierboom commented 4 years ago

We've just internally discussed the grouping/scenarios functionality. We see them as separate features:

So both have their own place.

I'd suggest that in cases where exercises contain both application and library tests, the tests should be split into distinct test groups.

Instead of having separate test groups, we'll add the library-test scenario to library tests to allow for easy excluding/including (note: we won't have an application-test scenario is that is the default and we don't to add this to virtually every test case). As test groups should group related functionality, I could easily imagine a test group having both application- and library tests.

yawpitch commented 4 years ago

This is a bit of a drive by, as I’m still not able to contribute to the project, but woe be unto thee who tries to reliably convert TOML to JSON... many of the library implementations out there aren’t even using the current published 1.0.0 “standard”, and even if yours is there are some pretty under-specified and ambiguous parts of the grammar. I’d never use it for anything that needs to be parsed consistently across language boundaries, and really would never consider it for anything but a simple and flat configuration file for an app I controlled (and did all the type conversions within).

iHiD commented 4 years ago

Nice to hear from you @yawpitch 🙂

wolf99 commented 4 years ago

Hi @ErikSchierboom going through aligning the C track to align the tests.toml with the implementations and I see that some exercises are missing the TOML file.

From the C track I have found the following are missing:

The C track does not implement all exercises, so there may be other exercises that have the same issue on other tracks Quickly checking another track (pharo-smalltalk) shows that this is not limited to the C track. For the C track I can create the files manually, but for across all repos it may be worth running the same (but fixed) script of whichever you used to generate the above PRs

ee7 commented 4 years ago

@wolf99 I was just now creating an issue for this! See https://github.com/exercism/canonical-data-syncer/issues/39.

Note that linked-list does not have a canonical-data.json (see here), so it's correct that canonical_data_syncer doesn't output a tests.toml file there.

I believe that the issue is just with grains.

wolf99 commented 4 years ago

Snap!

I have made a commit on the C repo adding the grains tests.toml. I guess the same file should be able to be used across all repos

https://github.com/exercism/c/pull/488/commits/72438e95569c7246fb3b8f7a9bee0e4bca060dae

(comments welcome on it if any issues!)

ee7 commented 4 years ago

@wolf99 I have edited the issue I created to contain the expected tests.toml contents for grains.

It contains what the canonical_data_syncer would have output if we didn't have this bug.

If we think that the test descriptions are bad then I believe we'll have to add new test cases that reimplement them with a better description.

ErikSchierboom commented 4 years ago

Whoops, totally forgot about the grains issue, for which I've created an issue in the Nim repo. We'll fix this and send another PR for the missing grains exercise's tests.toml file.

iHiD commented 4 years ago

Problem specs is now reopened! 🎉

I'm going to close this issue so it doesn't become a huge cross-cutting conversation. Pls open issues for any extra bits you find with regards to the reopening. Thanks everyone 💙