exercism / problem-specifications

Shared metadata for exercism exercises.
MIT License
327 stars 543 forks source link

Versioning guidelines #938

Closed NobbZ closed 4 years ago

NobbZ commented 7 years ago

Today 3 PRs have been made (perhaps more that slipped my RADAR), which all bumped MAJOR version, because of trivial changes.

In all 3 PRs voices rose that wanted a less drastic version bump:

But the current versioning and bumping guidelines tell us this:

  • MAJOR changes should be expected to break even well-behaved test generators.
  • MINOR changes would never break well-designed test generators, because the test-generation logic remains exactly the same.
  • PATCH changes would never break well-designed test generators, because the test data remains exactly the same

[enumeration mine, detailed descriptions omitted for now]

After reading this TL;DRs of the various kinds of version bumps, some question arise.

Even worse, on a further read of the more thorough description of each bump I found this:

[bump PATCH when] Regrouping/"Renesting" test cases without changing test case ordering.

But how is changing the nesting, not changing the logic of the testgenerator? It can't find the tests anymore where it expects them.

Also I do feel, that adding and removing testcases, may end in a completely different way to solve the exercise, which makes the whole testsuite incompatible with its previous version, so it should be a MAJOR bump here, instead of a MINOR as in the document:

[bump MINOR when] Adding or deleting test cases.

Nearly the same reasoning applies in my eyes for the other two cases where a MINOR were sufficient, but I'd do a MAJOR because it changes the way to solve the exercise drastically:

  • Changing the test cases inputs and/or outputs.
  • Changing the test cases ordering.

When we introduced the versioning scheme, I've understood it as versioning the exercise, today I learnt it's meant to version the generator.

I do not think this is the way to go, but I will not try to change that anymore, I think I'm just to late to the party to change this, but what I can try, is to simplify the current bumping guide and remove ambiguity:

Basically, this would mean, everything which could break existing solutions or examples or testgenerators gets a MAJOR bump. MINOR bumps are for semantically no-ops…

Edit: Fix typos

petertseng commented 7 years ago

When we introduced the versioning scheme, I've understood it as versioning the exercise, today I learn't its meant to version the generator.

That doesn't really make sense, since the generators, if they exist at all, live in individual repositories. But the goal does inform how track maintainers may treat their test generators.

I thought the goal is this: Allow track maintainers to understand whether the canonical-data has changed since it was last used, and if it has changed, the nature of the changes.

Part of it does relate to test generators, but it seems that it's useful for humans too. It's useful for even tracks that don't have test generators, right?

It looks like the intent was mirrored in https://github.com/exercism/problem-specifications/issues/673. The original author has left this organisation, so an attempt to directly ask the original author may not necessarily be effective, but it can be tried.

I thought the intent was good.

I agree that changing nesting would be larger than patch.

I have not judged whether each other item is in the correct category given its intent.

NobbZ commented 7 years ago

When we introduced the versioning scheme, I've understood it as versioning the exercise, today I learn't its meant to version the generator.

That doesn't really make sense, since the generators, if they exist at all, live in individual repositories. But the goal does inform how track maintainers may treat their test generators.

Yeah, in lack of a better wording, this is what I've written yesterday, and even though I had a lot of time to think about it, I have trouble finding correct words…

So I try it the other way round…

As a user, I do expect that if I see a major bump in the testdata, I do expect my solution to break. I expect the resulting testsuite to be different than before.

But in the current version scheme, if I see a major bump, all it tells me, the change in the data broke the program that generates the testsuite, if there is such a program at all… Even worse, we have proves of changes that require a major bump, but will result in exactly the same testsuite after the generator has been repaired.

This is counter intuitive!

Especially given the fact, that currently in at least one track (Haskell[1]) the version is even user faced, I feel confused that a version change indicating a bump, does not even change the testsuie…

[1] : Originally I wanted to copy their version schema, but since v2 is on the doorstep and it will bind a submission to the testsuite of the student, I dropped that plan.

Insti commented 7 years ago

It's tricky because I think we're trying to track two things at once. Should we add another version key? So we're tracking both testcases version and test metadata version?

petertseng commented 7 years ago

As a user, I do expect that if I see a major bump in the testdata, I do expect my solution to break. I expect the resulting testsuite to be different than before.

we have proves of changes that require a major bump, but will result in exactly the same testsuite after the generator has been repaired.

Oh, I see. This looks like a problem to me because it goes against the goal of knowing "has canonical data changed"? There are major version changes where the canonical data should not be considered to change, only its JSON representation.

Put another way, I originally wrote in https://github.com/exercism/problem-specifications/issues/673#issuecomment-285577332 ...

Major: You might need to rewrite your test generator. Minor: You will definitely not need to rewrite your test generator. You probably want to rerun your test generator though. Patch: Rerunning the test generator is probably not necessary.

However, we may have a "Major" change that indicates you may need to rewrite your test generator... but you don't need to rerun it even after you rewrite it because the generated test suite would be the same. Oops.

It's hard to come up with a scheme that answers both the questions "do I need to rewrite and/or rerun my generators" and "did the tests change?".

One possibility is two monotonically increasing version components that increase independently from one another (the proposal of Independent Components). If canonical data is version X.Y:

Seems too confusing and I'm going to mix up the meanings of the two components all the time. I don't like the proposal of Independent Components.

So I'm going to write this proposal. Is it still useful to know whether I might change a generator, even if it less important than knowing whether the tests changed? Then the proposal of Tests-Generators-Patch: If canonical data is version X.Y.Z:

I'm re-reading Tests-Generators-Patch and I don't think it's very useful anyway because no matter what, if I see a change in problem-specifications, I'm going to want to follow those changes, and I'll find out then whether generator needs to be rewritten. It's not very interesting to encode it in the version scheme, and it imposes additional burden since it may incite questions about whether a given change would require a generator rewrite.

OK, I say I support the proposal of just Major/Minor as noted in https://github.com/exercism/problem-specifications/issues/938#issue-264031108 .

HarrisonMc555 commented 7 years ago

At first I thought I supported the simple Major/Minor version as well, but we still have the issue of when the JSON representation changes in a way that would break generators but the test data will remain the same. It seems to me that we really are trying to keep track of two different things for two different audiences:

  1. The writers of the tests/test generators
  2. The writers of the solutions/examples

There are situations in which one could change but not the other and situations in which both could change together.

Although I know it's probably not popular or preferred, it seems like the best solution would be to have two different version numbers: one for the test cases themselves (semantics) and one for the representation (syntax).

We could possibly have two different keys.

If version is incremented, then I may need to rewrite my solution/example and re-run my test generator. If jsonVersion is incremented, I may need to rewrite my test generator. If both are incremented, then I need to do both.

I'm basically saying the same thing as @petertseng but I'm in favor having two different keys, because they really are keeping track of two totally different things.

petertseng commented 7 years ago

So, ultimately I would be happy with having both version keys. I would want others to make that decision only if both keys were useful. I want to think more about whether it is useful to track the structure version.

What would I, as a track maintainer, probably want to do in the three different cases?

So, I need to rerun the generator if and only if the semantics change. Therefore, it is indeed important to let maintainers know about semantics changes.

It does not seem necessary in the same way to let maintainers know about structural changes. Here is how I decide what to do the version number only indicates "is there a semantics change?"

I leave it to y'all to decide whether being able to tell ahead-of-time whether I need to rewrite a generator is worth the increased complexity of two separate version numbers. I say it is not.


Fun Fact: I do not currently maintain any track that uses test generators.

HarrisonMc555 commented 7 years ago

I like what @petertseng says. This sounds good to me.


I also do not maintain any tracks, so maybe I shouldn't be part of this discussion :wink:

Vankog commented 7 years ago

Funny to find this issue, because I was about to open a similar one and changed my mind midway through.

In my eyes some changes are not worth a minor, but a patch. For example, changing the ordering of cases creates semantically the same test suite. It only affects the end user's progression.

With the current definition, basically ONLY the minor version level has any relevance. The Major will probably never change and pure Patches seem quite rare.

e.g. in #957 I had to bump the Minor versioning, just because of an ordering-bug I introduced just prior to that.

Insti commented 7 years ago

I leave it to y'all to decide whether being able to tell ahead-of-time whether I need to rewrite a generator is worth the increased complexity of two separate version numbers. I say it is not.

I agree that it is not worth tracking separate version numbers and the version key should only track semantic changes that will cause changes to the generated test cases.

If you really care, structural or other changes that don't affect the test output can be detected via the git logs / commit hashes.


I do maintain a track that uses test generators.

And I don't even look at the version numbers, I just periodically regenerate the tests and look at the diffs to see what has changed. If any generators break I'll fix them. This is usually a simple task.

petertseng commented 7 years ago

By suggesting an extreme option, I hope to reveal what we want by listening to reasons why we reject this extreme option.

Let's remove versioning completely. We'll rely on git sha exclusively.

I think my reason for rejecting this is that sometimes the file changes but the tests don't change, and I don't want my up-to-date scripts to tell me merely that the file changed.

I think at this point all I care about is whether the tests will change (regardless of whether I need to rewrite a generator). In fact, I think that means I only need a single-component version, and that version changes iff tests change. I wouldn't oppose a version with more components, but they may mean more opportunities to forget bumping components.

Insti commented 7 years ago

Let's remove versioning completely. We'll rely on git sha exclusively.

I like this suggestion.

sometimes the file changes but the tests don't change, and I don't want my up-to-date scripts to tell me merely that the file changed.

Why cannot your up-to-date scripts regenerate the tests and only tell you if they change.

petertseng commented 7 years ago

sometimes the file changes but the tests don't change, and I don't want my up-to-date scripts to tell me merely that the file changed.

Why cannot your up-to-date scripts regenerate the tests and only tell you if they change.

Well, I was kind of hoping that the scripts can be run on any repository, and thus they have no language-specific generation functionality.

I suppose I could do some minimal parsing and generate tests in some arbitrary target language (doesn't even have to be a real language!) and thereby determine whether two JSON files represent the same set of tests. I'll chew on the idea and think about whether I find it useful. It's nonzero work, but some of the work has been done already (verify).

stkent commented 7 years ago

Yup, same situation for Java/Kotlin which have scripts for checking for updates, but manually-implemented tests; it would be nice to differentiate between changes that don't impact the suite of tests and those that do.

petertseng commented 7 years ago

sometimes the file changes but the tests don't change, and I don't want my up-to-date scripts to tell me merely that the file changed.

Why cannot your up-to-date scripts regenerate the tests and only tell you if they change.

Let us suppose that I arrange for the scripts to do just this. Please note the following observation about the effort associated with this procedure.

Let us suppose that I value the ability to determine, with a minimum of effort, whether the tests have changed.

If we adopt a version scheme that causes some version component to change if and only if the tests have changed, it seems that I can always get an answer simply by comparing version numbers.

If we remove the version completely and instead rely on a generator to make this determination (check whether the generated tests change), then an interesting thing might happen if, for example, a certain JSON file changes its input from strings "(1, 2)" to objects {"x": 1, "y": 2}. My generator might fail to run until I update it, and thus my answer to the question of "Did the tests change?" is "I don't know" until I expend effort to update the generator.

So it seems like the question will be: Is the increased effort (I might have to update a generator before I know whether tests changed) worth the freedom of having no version number?

NobbZ commented 7 years ago

I like the idea of removing the version number at all.

Since its the easiest thing to implement and to do with the time I have, the planned process of using my generators is like this anyway:

  1. Bump the git submodule to a more recent commit about once a week or month
  2. run the generator
  3. repair generators that weren't able to emit tests because of changed input format
  4. bump the emitted bookkeeping number for tests that actually create different output
  5. repeat steps 3 and 4 until nothing changes anymore
  6. do separate commits for each generated exercise and the bumped submodule.
  7. create a single PR

I'm even planning to add some check subcommand to my generator that does not emit anything, but prints a lists of JSON that are not parsable right now or would result in a different emitted test suite.

Even without the discussion about the versioning schema here, I'd probably do it like this, since its much easier than to pull out version number, parse it and make a programmatic guess based on it if tests might change or generator might need to change or both… Just trying is so much easier…

petertseng commented 7 years ago

I sure agree that if I have a generator, it is much easier to simply run it to see whether the generated tests have changed, and thus always automatically ensure the track is up to date, rather than check a version number which someone may have forgotten to change. This would make any version number unnecessary.

In consideration of those who do not have generators (as just one example, generators for three tracks are not just going to magically write themselves overnight), the other alternative I consider is a single-component version number, increased iff the test semantics changed.

For tracks without generators, this will help them determine whether the tests have changed without having to look through the contents of each commit.

The disadvantage is the extra burden of having to update that version number, and that some people may forget to do so.

If we wish to alleviate that disadvantage, then okay, what we can do is we can have a two-component version number, and enforce in CI that the version number must always change if the canonical-data.json is changed! Change major version number if test semantics changed, otherwise change minor version number. I believe this is just the original suggestion of https://github.com/exercism/problem-specifications/issues/938#issue-264031108 augmented with CI help.

Obvious disadvantage is more complexity in the CI.

Planning what happens for my tracks, none of which have generators:

I do not think having using git SHAs only would be terrible, but it may result in instances where I say "yup, nothing changed that I care about" and therefore I have to update .meta/problem-specifications-commit-that-this-exercises-tests-are-up-to-date-with (or a file that serves the same purpose) just so the up-to-date script stops telling me "there are new commits for this exercise, you should totally take a look", and update nothing else.

By the way, this problem is potentially shared by those languages with generators (they may have to update a SHA just to stop the scripts from saying there is an update, even though nothing changed), but the cost for those languages is lower since the generators exist.

With even the single-component version number, there could be fewer just-satisfy-the-nag commits.

This could depend, of course, on how often we think there will be changes that do not change test semantics. I kind of wish I could write a script to check this, but it's nontrivial with the current version scheme since major version bumps give an indeterminate answer.

My preferences: Single-component version > no versions at all, recommend using the git SHAs only > two-component version with CI enforcement of must change (just because I don't feel like being the one to write the CI enforcement) > current versioning scheme.

stkent commented 7 years ago

Well-put, as always, @petertseng. IMO maintaining a single version number is not too difficult (it's been fairly easy to catch missing version changes in PRs with the current, more complex, scheme) and is significantly more useful than git SHA (imagine what happens if a scripted change touches all canonical data files in a way that (naturally) alters the SHA but doesn't alter the meaning of the tests 😱). My vote would be for that mechanism.

rbasso commented 6 years ago

@petertseng wrote:

It looks like the intent was mirrored in #673. The original author has left this organisation, so an attempt to directly ask the original author may not necessarily be effective, but it can be tried.

It's been a while since I left, but today I got some time to look at some issues. I miss spending some time here. 😄

I'm definitely too late in this discussion and I didn't read most of it, so I'll just try to explain what I unfortunately may have left unwritten when trying to specify the semantic versioning for exercises.

If I remember correctly, it was something like this:

@NobbZ wrote:

What is a “well-behaved” testgenator? ... But how is changing the nesting, not changing the logic of the testgenerator? It can't find the tests anymore where it expects them.

To define what is a breaking change, it is necessary to assume a few things about how The Reader behaves. These where probably my assumptions back there:

Because all exercise's data follows the same fixed schema, a generator satisfying those criteria would only break if the structure of the object containing the test case changes, and so it would be well-behaved regarding the use of the JSON file.

The rationale was that a test generator shouldn't depend on more than what is needed. There is no need to depend on a fixed tree-like structure, only on the structure of the leaf-like-objects containing the test data. The property key allows the test-case type to be unambiguously inferred anywhere in the tree.

I wrote a proof-of-concept Gist with an interpreter of the test data in #336 which isn't sensible to reordering, regrouping or re-nesting of test cases. That would be a well-behaved test interpreter.

Anyway, I'm just trying to clarify the rationale back there. If the consensus it that generators don't need to be well-behaved, in that specific sense, other changes should certainly be considered MAJOR.

@NobbZ wrote:

When we introduced the versioning scheme, I've understood it as versioning the exercise, today I learnt it's meant to version the generator.

I think you are partially correct. There was a lot of discussion at the time about how to make generating exercises an easier task.

The MAJOR version is mostly relevant for algorithmic use of the data, but the MINOR version is all about the meaning of the test suite data.

While it could be argued that this is arbitrary, it was based on Semantic Versioning:

Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes.

About the proposal to change/remove versioning, I have just a few comments:

That's a really hard decision. Good luck! 😄

Sorry for the long post!

See ya!

petertseng commented 4 years ago

versions are no more! https://github.com/exercism/problem-specifications/pull/1678

hooray!