robertodr commented 5 years ago

This is a refactoring of the validation to fix, or offer a way to fix, #24, #26, #31, #33, and #34. This is still work-in-progress. I need to do step 7 and make sure that we have a good enough test suite. I am putting it up early because it might be a bit hard to digest "the festival of dict comprehensions". I have tried to document every function I have written in detail. Type hints might be wrong.

The general strategy:

Read the validation template. Here we check that the template is well-formed, that is: a. All keywords have:
- An allowed type.
- A non-empty docstring.
- A default callable that is valid Python, if present.
- Predicates that are valid Python, if present. Note that the latter two criteria can only be checked later on. b. No sections are nested under keywords. c. All sections have a non-empty docstring. This step might throw.
Extract views from it. A view has as keys the names of sections and keyword and as values the actual values, or the type or the docstring. So basically it's a way of taking a complex dictionary and extracting subsets of its information content. Note that this makes the generation of documentation trivial, as we can get the view-by-docstrings and use it directly to collate a documentation page.
Read in user input. This is basically an incomplete view-by-defaults.
Merge the view-by-defaults of the template ("THEIRS") and the user input ("OURS") with OURS strategy. As we now allow (almost) arbitrary pieces of code to be defaults, this is not the end of the story. This step might throw
Get the actual defaults. This is a bit tricky, but I think I've solved it correctly. Bonus: no need to teach the YAML reader to do complex numbers right :) This point actually fixes #26 and #31. The code that can be executed can only depend on the full input dictionary, nothing more. This step might throw
Do a pass at type checking and type fixation. This step might throw
Finally run the predicates. This step might throw

The fix for #24 is a byproduct of sorts to this restructuring: while traversing recursively if anything throws, we catch it and store how to get to the offending leaf and an appropriate error message. So for example, when the exception finally throws one will have something like this:

Error(s) occurred when fixing defaults:
- At user['scf']['another_number']:
    KeyError 'min_num_iterations' in defaulting closure 'user['scf']['min_num_iterations'] / 2'
- At user['scf']['functional']:
    TypeError unsupported operand type(s) for /: 'str' and 'int' in defaulting closure ''B3LYP' / 2'

A drawback of this refactoring is that there's now a lot more functions and each of them does a recursive descent in the tree. There is also quite some code duplication: we might get rid of that using more higher-order functions.

TODO ~- [ ] Switch to OrderedDict throughout, to avoid annoyances with ordering of elements.~ Not needed

[x] Run through mypy to check all type hints are correct.
[x] Check documentation is comprehensive enough.
[x] Create a primitive API and hook it up to CLI. Still unstable and up for debate

bast commented 5 years ago

Few comments without having read everything:

I like the dictionary name user.
I saw few functions repeated with the same name in different files which surprised me but I guess this is what you meant with "There is also quite some code duplication".
Optimize for good error reporting and code readability and I think this is still the case.
Let's not do nested comprehensions and let us not sacrifice readability for elegance - but this was not a problem in the code I have browsed - so far it looked readable. But in other words I personally always prefer 4 readable lines over 1 compact line where I have to think for 15 minutes on a white board - but this was not a problem until now.

bast commented 5 years ago

I like the simplification in the template. I was a bit unsure whether docstring is more intuitive than documentation. For Python developers it is.

bast commented 5 years ago

I think the plan looks solid. Are there any portions of the code or any choices you are least sure about and would like input on?

robertodr commented 5 years ago

Do we need OrderedDict? (I think so, but I might be wrong)
Testing is quite ugly and it's actually where most of the code repetition/boilerplate is. I would really like to have one test file per module, but this means writing a lot of reference dictionaries by hand.

bast commented 5 years ago

OrderedDict: For "older" Python yes but 3.6 and later we don't need it. In 3.6 it is an implementation detail, in 3.7 it became a feature. EDIT: by "it" I meant insertion order is preserved.

bast commented 5 years ago

About writing the reference dicts by hand - maybe we can write a script that will generate these and break these in controlled ways so that we can generate many dicts in one go for testing?

codecov[bot] commented 5 years ago

Codecov Report

Merging #42 into master will decrease coverage by 5.15%. The diff coverage is 84.39%.

robertodr commented 5 years ago

UPDATE I fixed this :smile: ~Lingering issues:- Default callables that mix types (like "'B3LYP' / 2") raise a SyntaxError rather than a TypeError. This is because the declared type is str and the way I implemented closure execution treats strings differently (see the run_callable function). This is the case for strings and complex numbers (as they are initially read in as strings). I don't know at the moment how to solve this (mild, I would say) inconsistency.~

bast commented 5 years ago

There seems to be some trouble on Python 3.5 - let me know if I should look into that.

robertodr commented 5 years ago

I switched to docstring because I consistently fail to type "documentation" correctly... The Python 3.5 problem should be fixed now. Compatibility is hard!

dev-cafe / parselglossy

Refactor validation #42

Codecov Report