exercism / python

Exercism exercises in Python.
https://exercism.org/tracks/python
MIT License
1.96k stars 1.3k forks source link

Change canonical-data reference to improve version checking #1734

Closed cmccandless closed 4 years ago

cmccandless commented 5 years ago

From @AnAccountForReportingBugs in https://github.com/exercism/python/issues/1726#issuecomment-475787036

[referencing these changes] This one took a bit to figure out. May want to loosen the regex or change how the comment looks.

Another idea is to you use a module level doc string or a global variable in the various _test.py files. This would allow direct access to the string using various techniques.

And from @cmccandless in https://github.com/exercism/python/issues/1726#issuecomment-476205579

I've run into this before where the 'v' was missing on the version reference in a test suite. I'd rather add a check somewhere that enforces one of the following than allow too much variation in the reference:

  • Tests based on problem-specifications//canonical-data.json @ v1.2.3`
  • No canonical-data.json for this exercise

I like the docstring idea. It would simplify some things.

AnAccountForReportingBugs commented 5 years ago

If you like the idea of a docstring, then you might want to change the test version checker to something like this:

import ast
def get_referenced_version(exercise):
    with open(get_test_file_path(exercise)) as file:
        contents = file.read()
    try:
        docstring = ast.get_docstring(ast.parse(contents, exercise))
    except SyntaxError:
        docstring = None
    if docstring is not None:
        match = rgx_canonical.search(docstring) # could also be rgx_version.search(docstring)
        if match is not None:
            return match.group(1)
    return '0.0.0'
cmccandless commented 5 years ago

1547 is an effort to implement test suite generators for this track, which would eliminate the variance in canonical-data references. That said, I still like using docstrings instead of comments as it uses a built-in feature of the language instead of just shoving something into place.

cmccandless commented 5 years ago

@yawpitch Something to consider in developing the test generator.

yawpitch commented 5 years ago

After reading the above I'm a little unclear as to when we're having problems parsing the version string (or indeed even attempting to parse it) ... when, exactly, do we do that?

cmccandless commented 5 years ago

when, exactly, do we do that?

There is a maintainer script that checks for out-of-date exercises and (optionally) creates an issue for that exercise to be updated. The above referenced #1726 was the result of the version comment having an "incorrect" format: the script failed to parse it, and assumed it was not referencing canonical data.

yawpitch commented 5 years ago

Ahh ... well, yeah, a generator can eliminate the issue, at least once all exercises are generated. Haven't revisited in some time, but was already generating the correct string, IIRC.

That said if we were doing this all over again or needed to have easy access to the version I'd tend to also go with the global constant approach (ie CANONICAL_VERSION = "x.y.z"), as that's just an import away without the hoops required to parse from a comment or docstring. Should have no side effects, since the tests all have an __name__ guard.

AnAccountForReportingBugs commented 4 years ago

I'm not really sure that is a good approach. It seems to me that would require programmatically importing each of the test files (which is doable but kind of a pain with importlib) and then checking the variable of the returned module. Or using eval() with empty dicts as the locals and globals and checking that. But in either of those cases, it seems easier to just use the ast to (relatively) safely get the docstring.

Also, to be a bit frank, that kind of smells to me as you are mixing concerns between documentation and code. But then again I could be wrong, as most of what I write would probably smell to high heaven to those with more experience, so take that one with a bit of salt.

Additionally, I think all of the exercism files should have some sort of module-level docstring anyway. Not only contain this information, but also to illustrate best practices for people who are new and exploring, to warn away arbitrary changes, and provide other information. For the tests files, something simple like:

"""Test suite for the {exercise} exercise by Exercism.

Do not edit this file directly! This file was automatically generated by the exercism
software and contains the test suite for the {exercism} exercise. © {date} by {owner}
and is distributed under the {license_name}. The tests herein are based on the
problem-specifications//canonical-data.json for the {version_number} revision.
"""
yawpitch commented 4 years ago

I don't dislike the idea of a bit of "Automatically generated" boilerplate on tests, though I'd maybe drop the admonition not to edit the file... we don't actually mind them adding new tests, if it helps them. I'm not sure any of the other languages add licensure or copyright boilerplate to any of the files usually distributed through the exercism CLI ... that would really be a call for @kytrinyx and @iHiD to make. But I'd be happy to get a PR adding something about automatic generation to the config/master_template.j2 or config/generator_macros.j2 files.

That said, that's a separate discussion from this issue's concerns, which I think have been mooted by the move of (virtually all) exercises to automatic generation ... we no longer need to do any sort of work to parse the version for an explicit check, we just need to generate from latest and see if there's a diff, plus if we do need to parse it there's now a consistent string to do that from, since it's generated via the template.

So personally I think we can just close this, since it was created before we did the big push towards generation. @cmccandless ?

cmccandless commented 4 years ago

I don't dislike the idea of a bit of "Automatically generated" boilerplate on tests

Agree.

though I'd maybe drop the admonition not to edit the file... we don't actually mind them adding new tests, if it helps them

The intent here I think is to avoid contributors editing the test files directly; it might be best to point them instead to the Contributing Guide or, more specifically, the Generator docs. We've gotten away without this sort of notice so far because it is recommended that all first-time contributors read the Contributing Guide before beginning work (this is quite normal for Open Source projects), and the Guide does point at these docs, so contributors are already being instructed on the correct method to add/edit tests.

To take it back to the original issue at hand:

So personally I think we can just close this, since it was created before we did the big push towards generation. @cmccandless ?

Yes, I'd say this issue can be closed. If the original offending script is still in use, it is obsolete and can be removed.

yawpitch commented 4 years ago

The intent here I think is to avoid contributors editing the test files directly; it might be best to point them instead to the Contributing Guide or, more specifically, the Generator docs. We've gotten away without this sort of notice so far because it is recommended that all first-time contributors read the Contributing Guide before beginning work (this is quite normal for Open Source projects), and the Guide does point at these docs, so contributors are already being instructed on the correct method to add/edit tests.

Ahh, hadn't considered that this was a contributor-facing bit of boilerplate. I'll rescind that point for any future PR, but maybe if it said "CONTRIBUTORS: this file was automatically generated from {path to .meta/template.j2}: please do not edit this file directly" or something similar?

Either way, closing this one ... from the above I'm not aware of a specific offending script in our repo to remove.