dev-cafe / parselglossy

Generic input parsing library, speaking in tongues.
https://parselglossy.readthedocs.io
MIT License
7 stars 2 forks source link

Refactor validation #42

Closed robertodr closed 5 years ago

robertodr commented 5 years ago

This is a refactoring of the validation to fix, or offer a way to fix, #24, #26, #31, #33, and #34. This is still work-in-progress. I need to do step 7 and make sure that we have a good enough test suite. I am putting it up early because it might be a bit hard to digest "the festival of dict comprehensions". I have tried to document every function I have written in detail. Type hints might be wrong.

The general strategy:

  1. Read the validation template. Here we check that the template is well-formed, that is: a. All keywords have:
    • An allowed type.
    • A non-empty docstring.
    • A default callable that is valid Python, if present.
    • Predicates that are valid Python, if present. Note that the latter two criteria can only be checked later on. b. No sections are nested under keywords. c. All sections have a non-empty docstring. This step might throw.
  2. Extract views from it. A view has as keys the names of sections and keyword and as values the actual values, or the type or the docstring. So basically it's a way of taking a complex dictionary and extracting subsets of its information content. Note that this makes the generation of documentation trivial, as we can get the view-by-docstrings and use it directly to collate a documentation page.
  3. Read in user input. This is basically an incomplete view-by-defaults.
  4. Merge the view-by-defaults of the template ("THEIRS") and the user input ("OURS") with OURS strategy. As we now allow (almost) arbitrary pieces of code to be defaults, this is not the end of the story. This step might throw
  5. Get the actual defaults. This is a bit tricky, but I think I've solved it correctly. Bonus: no need to teach the YAML reader to do complex numbers right :) This point actually fixes #26 and #31. The code that can be executed can only depend on the full input dictionary, nothing more. This step might throw
  6. Do a pass at type checking and type fixation. This step might throw
  7. Finally run the predicates. This step might throw

The fix for #24 is a byproduct of sorts to this restructuring: while traversing recursively if anything throws, we catch it and store how to get to the offending leaf and an appropriate error message. So for example, when the exception finally throws one will have something like this:

Error(s) occurred when fixing defaults:
- At user['scf']['another_number']:
    KeyError 'min_num_iterations' in defaulting closure 'user['scf']['min_num_iterations'] / 2'
- At user['scf']['functional']:
    TypeError unsupported operand type(s) for /: 'str' and 'int' in defaulting closure ''B3LYP' / 2'

A drawback of this refactoring is that there's now a lot more functions and each of them does a recursive descent in the tree. There is also quite some code duplication: we might get rid of that using more higher-order functions.

TODO ~- [ ] Switch to OrderedDict throughout, to avoid annoyances with ordering of elements.~ Not needed

bast commented 5 years ago

Few comments without having read everything:

bast commented 5 years ago

I like the simplification in the template. I was a bit unsure whether docstring is more intuitive than documentation. For Python developers it is.

bast commented 5 years ago

I think the plan looks solid. Are there any portions of the code or any choices you are least sure about and would like input on?

robertodr commented 5 years ago
bast commented 5 years ago

OrderedDict: For "older" Python yes but 3.6 and later we don't need it. In 3.6 it is an implementation detail, in 3.7 it became a feature. EDIT: by "it" I meant insertion order is preserved.

bast commented 5 years ago

About writing the reference dicts by hand - maybe we can write a script that will generate these and break these in controlled ways so that we can generate many dicts in one go for testing?

codecov[bot] commented 5 years ago

Codecov Report

Merging #42 into master will decrease coverage by 5.15%. The diff coverage is 84.39%.

robertodr commented 5 years ago

UPDATE I fixed this :smile: ~Lingering issues:- Default callables that mix types (like "'B3LYP' / 2") raise a SyntaxError rather than a TypeError. This is because the declared type is str and the way I implemented closure execution treats strings differently (see the run_callable function). This is the case for strings and complex numbers (as they are initially read in as strings). I don't know at the moment how to solve this (mild, I would say) inconsistency.~

bast commented 5 years ago

There seems to be some trouble on Python 3.5 - let me know if I should look into that.

robertodr commented 5 years ago

I switched to docstring because I consistently fail to type "documentation" correctly... The Python 3.5 problem should be fixed now. Compatibility is hard!