Closed ErikSchierboom closed 4 years ago
Can we just use sequential integers for identifying the test cases rather than UUIDs? It'll save us all a lot of generating UUIDs and copying and pasting them all over the place.
There's a small possibility of more merge conflicts when two PRs are modifying the same tests file, but other than that I don't see a downside if all tests for each exercise will be in one file.
Can we just use sequential integers for identifying the test cases rather than UUIDs? It'll save us all a lot of generating UUIDs and copying and pasting them all over the place.
My preference would be that the IDs are filled in automatically when a case doesn't have one, regardless of what they look like.
Can we just use sequential integers for identifying the test cases rather than UUIDs?
There's a small possibility of more merge conflicts when two PRs are modifying the same tests file, but other than that I don't see a downside if all tests for each exercise will be in one file.
The integer approach may be problematic if there is still support for track-specific test cases. How would you number those?
My preference would be that the IDs are filled in automatically when a case doesn't have one, regardless of what they look like.
If UUID is automatically added by tooling, then it's a perfect solution with no additional effort for maintainers (once tooling is in place).
If UUID is automatically added by tooling, then it's a perfect solution with no additional effort for maintainers (once tooling is in place).
I think we should be able to do this through a GitHub action, right @SaschaMann?
We would still need to copy and paste the UUIDs into the generator code in the tracks and remember to run these actions, which is a minor faff. I'm not dead-set against them, just prefer small integers unless there is a clear need for bigger identifiers.
still support for track-specific test cases.
These live in the track repos, right? If so, I don't see why they would ever need to be disabled or have UUIDs, just comment them out or delete them from their source files because you own them
We would still need to copy and paste the UUIDs into the generator code in the tracks and remember to run these actions, which is a minor faff. I'm not dead-set against them, just prefer small integers unless there is a clear need for bigger identifiers.
Actually you don't :) This is where the tooling we'll be releasing comes in. It will give a command-line application that will add/update the exercises .meta/tests.toml
in which the UUIDs are specified. Track generators should be updated to use this .meta/tests.toml
file to get the UUIDs of the enabled tests, they should not be hard-coding them in the source code.
Ah, I missed the significance of that. Thanks!
I like this plan. I think it is worth a try.
One question/comment: you say that existing tracks will be updated with TOML file that has all canonical exercises set to true
. Wouldn't it be probably more correct to have that default to false
? Either way will be wrong - but it seems the latter would be more correct, more often.
Of course when you put in a PR for these files on my track I can just sed -e 's/true/false'
so not a big issue... just curious why you are choosing true
.
If a track's exercise is based on canonical data from the Problem Specifications repo, the exercise should contain a .meta/tests.toml file.
We are using JSON everywhere, why introduce another config format?
🍆
(In other words: I'm happy with this)
just curious why you are choosing true
Based on the assumption that currently most tracks read directly from the canonical-data.toml atm so they're effectively implicitly working from all tests. So it seemed that true
would most likely conform to the existing situation, making a regen of the tests per track effectively a no-op.
Wait? Now you add a YAML file to the mix?
Why JSON + YAML + TOML? Can't we agree on one of those?
Preferably JSON, as it probably has the most parsers available across languages?
Why JSON + YAML + TOML? Can't we agree on one of those?
The nature of an evolved design.
Preferably JSON, as it probably has the most parsers available across languages?
I want to [Citation Needed]
this, but you're probably right. Even so, if we do choose to standardize, I'd pick TOML over JSON any day:
It's just a nicer format to work with.
... Looks like I never actually posted my initial responses. Oops.
- There is no longer any discussion whether a change is a patch, minor or major update.
- We no longer need the versioning of the canonical data.
I very much like the idea of getting rid of the test semver entirely. That's been the cause of a lot of extraneous human effort in the past.
- Exercises must contain tests that cover the public interface of the exercise (also thought of as "application tests").
- Exercises may contain tests that cover the private or lower-level interface of the exercise (sometimes refered to as "library tests").
The sync command can [manipulate]
.meta/tests.toml
...
I'd suggest that in cases where exercises contain both application and library tests, the tests should be split into distinct test groups. Further, the tooling should be smart enough to give the maintainer the option to approve or reject an entire test group at a time, based on its description. The result of these policies in combination is that maintainers who do not want library tests in their suite can reject an arbitrary number of current and future library tests with a single flag.
The implication is that test groups must also have identifiers (probably UUIDs), and the tests.toml
specification file must be able to include/exclude/leave unspecified entire groups at a time.
Alternatively, this feels like something which could be handled with scenarios.
I kind of like the new scenarios mechanism, but it feels very redundant with the existing test grouping strategy. I'd suggest that we choose one of two options here:
If we do keep the concept of scenarios, then they should have an identifier, probably UUID, which gets tracked in tests.toml
so that, bare minimum, a maintainer can reject an entire scenario in a single operation.
Can we just use sequential integers for identifying the test cases rather than UUIDs?
The major problem I see with this is when two edits to the canonical data for a particular exercise come in simultaneously: sequential integers are very likely to cause merge conflicts, UUIDs are not.
@NobbZ Was a typo. I've edited my post.
I want to [Citation Needed] this, but you're probably right. Even so, if we do choose to standardize, I'd pick TOML over JSON any day
I agree. JSON is a horrible format for manual writing/editing. I'd also argue it's less familiar to people than TOML because you frequently encounter ini files, even as a non-programmer, when using a computer and TOML is quite similar to them. JSON is rather web-dev specific. I'd be happy to switch to TOML for the human-edited canonical data and config files, and auto-convert them to JSON for backwards compatibility for the site/generators. Or switch to something like JSON5.
I kind of like the new scenarios mechanism, but it feels very redundant with the existing test grouping strategy.
A made-up counterexample: Say I have a group of tests that test string concat. It would have tests like "Hello, " * "World!" == "Hello, World!"
, "foo" * "bar" == "foobar"
. Now a test involving unicode, e.g. "🐶" * "🐱" == "🐶🐱"
would belong in that group. It tests the exact same behaviour, concatenation of two strings. But, as a compatibility feature for less-unicode-savy languages, it could be tagged with the scenario unicode
. In this example the grouping would be used for the abstract behaviour, while the scenario would be used for highlighting implementation details. There are several exercises that have cases like this.
If we do keep the concept of scenarios, then they should have an identifier, probably UUID, which gets tracked in tests.toml so that, bare minimum, a maintainer can reject an entire scenario in a single operation.
What benefit does a UUID have over a string identifier in this case?
Some more questions:
Is the group of a test considered metadata and thus mutable?
How does JSON handle multiple keys with the same name? Since there are now quite a few things that could benefit from or require comments, e.g. scenarios
and reimplements
, it would be useful to comment directly above or below it. Can we just add multiple "comments": ...
entries to the same object?
The scenarios field can use one or more of a predefined set of values, which are defined in a SCENARIOS.txt file.
Will this be a repo-wide file, or an exercise-specific file?
@SaschaMann A made-up counterexample: ...
There's nothing preventing multiple scenarios from being applied to a single test case; in that way, they're a generalization of grouping, which requires a strict tree structure. In your example, I'd expect to see "scenarios": ["string-concat", "unicode"],
. A unicode-averse track could then remove that test case without any human intervention if they had already banned the "unicode" scenario, even if it's also in the "string-concat" scenario.
As I see it, the main benefit of the current nesting structure as opposed to scenarios is that it's more intuitive; the new structure will want to be somewhat more machine-mediated.
What benefit does a UUID have over a string identifier in this case?
tests.toml
: all lists are of UUIDs, not sometimes UUIDs and sometimes string identifiersAs I see it, the main benefit of the current nesting structure as opposed to scenarios is that it's more intuitive; the new structure will want to be somewhat more machine-mediated.
There's still a difference because in many cases, the group structure translates to the nesting of the tests in the implementation of the exercise. With "scenarios": ["string-concat", "unicode"]
it's unclear how they should be nested.
We are using JSON everywhere, why introduce another config format?
Basically: ease of editing, less verbose and the ability to add comments. We considered using JSON, but chose TOML for the aforementioned reasons. @SaschaMann and @coriolinus have described these reasons well.
I very much like the idea of getting rid of the test semver entirely. That's been the cause of a lot of extraneous human effort in the past.
Definitely.
The scenarios field can use one or more of a predefined set of values, which are defined in a SCENARIOS.txt file.
Will this be a repo-wide file, or an exercise-specific file?
It will be repo-wide. We'll rename the existing OPTIONAL-KEYS.txt document for that.
How does JSON handle multiple keys with the same name? Since there are now quite a few things that could benefit from or require comments, e.g. scenarios and reimplements, it would be useful to comment directly above or below it. Can we just add multiple "comments": ... entries to the same object?
In principle, it is allowed, but in practice I've never seen people using it. There is also a considerable risk that existing JSON parsers will break (see https://stackoverflow.com/a/23195243/2071395), so I'd argue against using multiple comments keys and just using putting all information in one "comments" field. Side note: did you already know this @SaschaMann and is this a clever ploy that allows you to show the benefits of TOML? 🤣
In principle, it is allowed, but in practice I've never seen people using it. There is also a considerable risk that existing JSON parsers will break (see https://stackoverflow.com/a/23195243/2071395), so I'd argue against using multiple comments keys and just using putting all information in one "comments" field. Side note: did you already know this @SaschaMann and is this a clever ploy that allows you to show the benefits of TOML? 🤣
No I didn't, it was just a nice coincidence to find yet another reason why json is a poorly suited choice :D
so I'd argue against using multiple comments keys and just using putting all information in one "comments" field.
That sounds like it can get incredibly messy, especially with the convention of super short lines and splitting the comments up into an array. Maybe as a bodge we could define all fields starting with _
as comment? Or alternatively, get rid of the before mentioned convention and let editors handle the line wrapping. Then one array entry can correspond to one comment "section".
Some exercises like complex-numbers already have messy comments, this will just make it worse.
Or as I suggested earlier, use a format that supports comments and convert to json for backwards compat with generators :)
That sounds like it can get incredibly messy
Maybe we should see how this pans out first? It is possible that it won't be that messy. And if it will be, we can always change this up later. I'm not saying I don't think your argument is valid, just that I'm not entirely sure we have to take this on right now.
We've just internally discussed the grouping/scenarios functionality. We see them as separate features:
So both have their own place.
I'd suggest that in cases where exercises contain both application and library tests, the tests should be split into distinct test groups.
Instead of having separate test groups, we'll add the library-test
scenario to library tests to allow for easy excluding/including (note: we won't have an application-test
scenario is that is the default and we don't to add this to virtually every test case). As test groups should group related functionality, I could easily imagine a test group having both application- and library tests.
This is a bit of a drive by, as I’m still not able to contribute to the project, but woe be unto thee who tries to reliably convert TOML to JSON... many of the library implementations out there aren’t even using the current published 1.0.0 “standard”, and even if yours is there are some pretty under-specified and ambiguous parts of the grammar. I’d never use it for anything that needs to be parsed consistently across language boundaries, and really would never consider it for anything but a simple and flat configuration file for an app I controlled (and did all the type conversions within).
Nice to hear from you @yawpitch 🙂
Hi @ErikSchierboom going through aligning the C track to align the tests.toml
with the implementations and I see that some exercises are missing the TOML file.
From the C track I have found the following are missing:
The C track does not implement all exercises, so there may be other exercises that have the same issue on other tracks Quickly checking another track (pharo-smalltalk) shows that this is not limited to the C track. For the C track I can create the files manually, but for across all repos it may be worth running the same (but fixed) script of whichever you used to generate the above PRs
@wolf99 I was just now creating an issue for this! See https://github.com/exercism/canonical-data-syncer/issues/39.
Note that linked-list
does not have a canonical-data.json
(see here), so it's correct that canonical_data_syncer
doesn't output a tests.toml
file there.
I believe that the issue is just with grains
.
Snap!
I have made a commit on the C repo adding the grains tests.toml
. I guess the same file should be able to be used across all repos
https://github.com/exercism/c/pull/488/commits/72438e95569c7246fb3b8f7a9bee0e4bca060dae
(comments welcome on it if any issues!)
@wolf99 I have edited the issue I created to contain the expected tests.toml
contents for grains
.
It contains what the canonical_data_syncer
would have output if we didn't have this bug.
If we think that the test descriptions are bad then I believe we'll have to add new test cases that reimplement them with a better description
.
Whoops, totally forgot about the grains
issue, for which I've created an issue in the Nim repo. We'll fix this and send another PR for the missing grains
exercise's tests.toml
file.
TL;DR; @exercism/track-maintainers We are planning on reopening problem specs in the next couple of weeks. The key change we are making is that problem-specs should be thought of as set of optional test cases. Individual tracks should choose which test cases to implement on a per-test case basis. We are adding a simple, per-exercise configuration file that keeps track of which test cases the exercise implements. We'll provide a simple mechanism for tracks to keep this file up-to-date with the test cases in the exercise's canonical data. The only change required for track maintainers is to update their test-generators to generate tests for just the test cases enabled in the configuration file. All test cases will be immutable; changes can only be introduced by adding new test cases, which can be flagged as re-implementing an existing test case.
A year ago we temporarily put the Problem Specifications repo in bug-fixes only mode (see this issue). This was done as there were systematic problems with the Problem Specifications repo that were causing conflict between maintainers, which we did not have the resources to fix in the immediate term. Now that we're making good progress on Exercism v3, we have been able to dedicate time to designing a solution to the problems in the Problem Specifications repo.
In this issue I will outline the problems that we're trying to solve and the solution we have come up with. We are very excited to re-open the Problem Specifications repo, which has always been one of the most active parts of Exercism and has brought a great deal of joy to many, and how these changes will allow us to do that. Let's get to it! 🎉
Intro to Problem Specifications
The Problem Specifications repo describes a set of language-agnostic exercises, which maintainers can use to build exercises for their tracks. Over time, exercises have grown to not only contain shared titles and descriptions, but also canonical data that describes a set of test cases for the exercise. This setup has been very helpful for maintainers for tasks such as adding new exercises to tracks, updating existing existing exercises as well as bootstrapping entirely new tracks.
As the canonical data is defined as JSON data, some tracks have built tooling to automatically generate test suites from them. This has allowed these tracks to quickly scaffold test suites for new exercises, as well as fix bugs in existing (generated) test suites.
Unfortunately, as different tracks (and different maintainers) have different desires for how their tests are structured and written, this has caused tension. In this document, we'll describe what has caused this tension and how we aim to resolve this tension.
Issues with Problem Specifications
Issues with the Problem Specifications repo occur when there is a 1-to-1 correspondence between an exercise defined in Problem Specifications and a track implementation of that exercise. In other words, if changes to the exercise data in Problem Specifications result in changes to a track's implementation, there can be trouble.
README
The
README.md
file for track exercises is generated directly from the Problem Specifications exercise'sdescription.md
andmetadata.yml
files, using theconfiglet
tool.An example of how this can be problematic is the word "promises", with some languages instead using the word "futures" to describe the same concept.
Canonical data
Most of the exercises in the Problem Specifications repo are implemented in multiple tracks. This means that those tracks implement the same exercise using the same canonical data.
As an example for how this can be problematic, consider one track wanting to add a test case that uses Unicode strings as input, which another track would not want to include in their track due to increased complexity for their students. Another example is when one track wants to only have tests for the public interface of the exercise, whereas another tracks wants to have tests for the private or lower-level interface of the exercise.
This issue is compounded even further when the track exercise's test suite is generated directly from a Problem Specifications exercise's
canonical-data.json
file, as maintainers often are reluctant to do a pre- or post-processing in their generators.Note that not all types of changes to canonical data are problematic. We can discern the following four types of changes:
As can be seen, not all types of changes are equally problematic. The most problematic changes are when either:
Of course, not all changes are problematic, and many changes are in fact what one could consider "bug fixes", where the expected value does not match what the instructions specify (this could also be due to invalid instructions though). Determining whether something is a mere "bug fix" or something more substantial has proved difficult though, with opinions varying depending on how a track uses a specific test case.
To prevent breaking changes, the canonical data is currently versioned using SemVer, so in theory test generators could use this to do "safe" updates. In practice though, most test generators always use the latest version to generate their exercises.
Redefining Problem Specifications
The first step in solving these issues is defining a clear guiding rule for the Problem Specifications repository:
Problem Specifications exists to cover the "happy path". That means it should work for most tracks most of the time. In situations that aren't in the "happy path", it is down to individual tracks to solve those issues locally.
Application/library tests
Canonical data
Changing test cases
As test cases will be immutable, one cannot change an existing test case and thus a new test case must be added. This has several nice consequences:
Let's look at an example. Suppose we have the following test case:
We found a mistake in the expected value, but with tests being immutable, we'll have to add a new test case (with a new UUID):
Some additional consequences of this approach:
There are now two test cases with the same description, but different UUIDs, so how does one know which test case to use? For this, we'll use two things:
"reimplements"
field which must contain the UUID of the test case it is re-implementing. Note that we haven't named this field"fixes"
or"supersedes"
or something like that, as re-implemented test cases might actually not make sense for every track. This field can be omitted for test-cases that don't reimplement an old test."comments"
field to explain why a test case was re-implemented (e.g."Expected value is changed to 2"
).While tracks could automatically select the "latest" version of a test case by looking at the "reimplements" hierarchy, we recommend each track to make this a manual action.
With the above suggested changes, the test cases now look like this:
comments
field can be mutated, as it is merely metadata that won't be reflected in the tests. Changing it thus does not mean adding a new test case.This change was guided in part us feeling that canonical data test cases actually don't change that often, so it would be a relatively minor inconvenience.
Scenarios
optional
field.scenarios
field.scenarios
field can use one or more of a predefined set of values, which are defined in aSCENARIOS.txt
file.scenarios
field can be mutated additively, by adding new scenarios. Existing scenarios must not be changed or removed. Adding new scenarios does therefore does not mean adding a new test case.library-test
scenario added to allow for easy including/excluding of library tests. Application tests won't have their own scenario, as they must be included and should not be filtered on.Indicate which test cases have been implemented
.meta/tests.toml
file.true
orfalse
).The advantages to having this file are:
We can script-prepopulate the initial creation of these
.meta/tests.toml
files across all tracks, initially setting all test cases totrue
.Notifications
The
.meta/tests.toml
file allows us to detect which test cases an exercise implements. This gives us the ability to tackle one of the downsides of using the Problem Specifications repository as the source of data: not knowing when canonical data has been changed.Tracks that implemented an exercise using canonical data as its basis, basically had two options to check if canonical data was changed:
Both options are clearly not ideal.
Using the
.meta/tests.toml
files, we'll be building a GitHub action that serves as a notification bot. It will regularly check the track's.meta/tests.toml
files' contents against the test cases defined in Problem Specification repo'scanonical-data.json
files.Based on this diff, it could automatically do things like post an issue to your track's repository if there are any relevant changes, which could look like this:
We could even create individual issues per exercise, but the most important things is that we can do this type of automation when tracks use the
.meta/tests.toml
file.Tooling
We'll add a new
sync
command to configlet to help work with.meta/tests.toml
files.The
sync
command can be used to detect if there are exercises which.meta/tests.toml
file is missing test cases that are in its canonical data.The
sync
command can scaffold a new.meta/tests.toml
file for an exercise by interactively giving the maintainer the option to set a test case's status totrue
orfalse
.The
sync
command can update an existing.meta/tests.toml
file by interactively giving the maintainer the option to set a test case's status totrue
orfalse
.We'll add a new tool to allow easy formatting of the JSON files in the Problem Specifications repo.
This new tool will be able to verify the correct of the Problem Specifications repo, replacing the existing bash scripts.
README
README.md
files contain both the problem's story as well as the instructions on what is expected of the student. In the future, we'd like to replace theREADME.md
file with two separate files:story.md
andinstructions.md
. This is similar to how the v3 Concept Exercises' documentation is split up over multiple files.Formatting
Where to go from here?
We’re really excited about re-opening Problem Specifications. We feel like these changes should fix the issues that caused the repo to be locked, but not add any meaningful extra burden on maintainers.
If there is general consensus with this approach we will action these changes and re-open Problem Specifications as soon as possible. I have already prototyped much of the tooling but still have to do some work to get it finished. We also need to add documentation to Problem Specs, and make PRs both to this repo for the UUIDs, and the new files in the track repos. My plan is to submit PR’s for these changes over the next two weeks, and then reopen this repository by mid-October.
Once the above changes have been merged, we can start accepting PR's again (although some of them probably need some little tweaking to correspond to the new format).
Thanks to everyone for being patient with us over this past year! I look forward to hearing your thoughts on all these changes :)