exercism / python

Exercism exercises in Python.
https://exercism.org/tracks/python
MIT License
1.73k stars 1.24k forks source link

All exercise test files should reference canonical data version #784

Closed N-Parsons closed 5 years ago

N-Parsons commented 6 years ago

Currently some of the test files reference x-common//canonical-data.json with a version number, and a few test files lack any such a reference. Additionally, x-common was recently renamed to problem-specifications, so references to x-common are inherently outdated and potentially confusing.

Having a consistent version string should help with maintenance, since a script can fairly easily be written to compare the version numbers in the test file and the canonical data, and manual checks for how up to date it is are similarly easy. It also makes it easier for exercism users to find and check the canonical data, and thus to be able to contribute any changes they think are necessary if they identify problems or additional tests that would be useful.

Suggested wording

The currently adopted form for the version string seems to be: # test cases adapted from `x-common//canonical-data.json` @ version: 1.0.1

To conform to PEP8, we should aim to limit the string to 79 characters, so I propose the following wording (75 chars): # Tests adapted from `problem-specifications//canonical-data.json` @ v2.0.0

Does anyone have any alternatives or suggestions for improvements to this wording?

How to proceed

There are a couple of options for how to go about updating the test files:

  1. Only update version strings when the tests are next updated, and update tests when the canonical data change.
  2. Systematically update each test file to match the most recent canonical data version (or verify that it's already up-to-date), and update the version strings in the process.

Option 1 is obviously the less work-intensive one, but it would leave some inconsistency for quite a while. Option 2 is more work-intensive and should get everything up-to-date quicker, but it might burn people out a bit.

ilya-khadykin commented 6 years ago

Thanks for raising the issue, it's an important one. I think the second option is a way to go and it should be automated. Unfortunately, I have a lot of going on right now, so I can't do it right now

The other thing is that some tests can be trak specific, so it can be also stated in the comment giving the reader a sense of the reason for such decision

N-Parsons commented 6 years ago

I had some free time this evening, so I put together a fairly rough script for checking the state of version numbering. I put the results into a gist to save making a very long post here, and I have summarised the results below.

Up-to-date: 33 Out-of-date: 8 Unknown: 31 (no version tag) Unimplemented: 27 No canonical data: 14 No data needed: 5 (maybe more - requires manual checks)

ilya-khadykin commented 6 years ago

@N-Parsons wow, that's awesome! The next step will be a creation of issues for required exercises and linking them with this issue

N-Parsons commented 6 years ago

I realised that some deprecated exercises were included, so I added a check for that, as well as checking whether the version string references x-common.

Script: gist (rev 2) Results: gist (rev 4)

Summary:

Up-to-date: 0 Up-to-date, but x-common: 33 Out-of-date: 8 Unknown: 26 (no version tag) Unimplemented: 28 No canonical data: 8 (could be submitted to problem-specifications) No data needed: 5 Deprecated: 11

ilya-khadykin commented 6 years ago

@N-Parsons how do you think we should proceed, for which exercises we should open issues?

N-Parsons commented 6 years ago

I've written a script to automate the update for the 33 exercises that reference x-common, so I'm ready to submit those PRs whenever you want - do you want issues to exist for them first? The script is set-up to create separate PRs for each, since I presume that this is preferable?

The "Out-of-date" and "Unknown" categories will need a more manual approach, so I think issues should be created for these. I'm happy to create the issues if you want, but I don't think that I can label them? (I'm not sure what labels you would want, either)

ilya-khadykin commented 6 years ago

@N-Parsons that is awesome. I think that you can create a single PR for all 33 exercises that reference x-common. But for "Out-of-date" and "Unknown" seperate issues should be created for people to check them. I think I'll add hacktoberfest and beginner-friendly labels

Will it work for you?

N-Parsons commented 6 years ago

@m-a-ge, that's fine - I'll tweak my script and do the pull request now. Do you want separate commits within the PR? Or just one big commit?

ilya-khadykin commented 6 years ago

@N-Parsons I guess one commit message would be enough, since it solves one issue

N-Parsons commented 6 years ago

@m-a-ge, all of the issues should now be created and correctly labelled :)

ilya-khadykin commented 6 years ago

@N-Parsons, thanks!

sjwarner-bp commented 6 years ago

I implemented the canonical data in problem-specifications for the protein-translation problem recently, and as such the Python test suite is out of date.

I'm about to open an issue that should be consistent with @N-Parsons similar issues. #1103

cmccandless commented 6 years ago

Updated lists:

Do not reference existing canonical-data:

No canonical data:

This list will no longer be kept up to date. Script bin/check_test_version.py can be used to determine which exercises are out-of-date.

cmccandless commented 5 years ago

Closing this issue as all issues for which canonical data exists contain a reference, and a script has been put in use for detecting new canonical data.