PCMSolver / pcmsolver

An API for the Polarizable Continuum Model
http://pcmsolver.readthedocs.io/
GNU Lesser General Public License v3.0
32 stars 21 forks source link

[Discussion] Input parsing without Getkw #88

Open robertodr opened 7 years ago

robertodr commented 7 years ago

This thread is a Request For Comments on how to rewrite input parsing. @ilfreddy @arnfinn @bast Please comment on any of the following sections. @loriab I would also appreciate your input on this matter, if you have the time.

Goals to achieve

Current situation

Input can go through two, mutually exclusive, routes:

  1. An extra input file is parsed by a Python script and then read C++ side. This is the Getkw way of doing things. Python-side, options left blank are filled with a default. C++-side there's the Input class that contains all information read in. In the class, structs with input for specific classes (i.e. for the cavity, the Green's functions, the solvers etc.) are generated and passed onto the factory functions for object creation (look at the flow of the code in Meddle.cpp)
  2. The host passes a struct with input parameters set, usually from its own input parser. The input so passed is only a subset of the possible input to PCMSolver. This makes the second route not usable for most (if not all of) the recent functionality introduced.

Drawbacks

Suggested solution

The suggested solution is to remove Getkw altogether:

  1. The Input object would go away for good.
  2. The Python script would go away, solving #40

and move towards a data-passing-based solution, i.e. extending current route number 2 to cover all of the input needed by PCMSolver. This is a top-down perspective:

  1. The pcmsolver_new API function would accept one single parameter, a struct called PCMSolverInput that collects all host program input to the module, in atomic units:
    • Molecular geometry (number of nuclei, charges, coordinates, symmetry generators)
    • Host writer, i.e the function the module will use to write its output to the host program output file.
    • Cavity generator input parameters
    • Green's function input parameters
    • Solver input parameters
  2. The pcmsolver_new function would in turn invoke the constructor of the Meddle object.[^2] Instead of generating a whole Input object from the passed PCMSolverInput struct, it would unpack the struct (pattern matching of sorts) into smaller structs to feed to the factory function generating the cavity, Green's function and so forth. This touches on #52
  3. From this point on, it's business as usual.
  4. To run the module standalone we would need to rewrite a wrapper for reading input from file. This could become difficult, but I think the correct strategy would then be to use a standard format, like YAML #50

Why/How would this automatize documentation of input? Because we could attach Doxygen documentation strings to the various fields on the PCMSolverInput struct.

Challenges

  1. The input management for some host programs would have to be rewritten from scratch. I think the benefits would outweigh the annoyance though.
  2. Getting interoperability of the C struct and the Fortran 90 user defined type right. The struct will need to have strings and array fields.

Notes

[^1] The input to PCMSolver is embedded in the host input and unpacked into a different input file in the scratch directory at run time. The PCMSolver Python script then does its own parsing. [^2] The Meddle class is the one responsible for routing the computation to the correct internal objects. The C++ API, if you wish.

ilfreddy commented 7 years ago

The input file is "touched" when the program is running. @robertodr said is because everything is made uppercase. Still that should happen in the cache and not directly to the file on disk. This is (mildly) annoying when one is performing many tests with small modifications to the same input, because the editor complains the file has been modified. Feature request: make sure the original input file is not touched (an intermediate uppercase version could be created if need be).

robertodr commented 7 years ago

@ilfreddy I've made a separate issue #91 out of your request. Let's try to limit this thread for comments on input parsing :wink:

loriab commented 7 years ago

Would you want to consider PCMSolver only taking JSON/py-dict input, with QC progs responsible for generating it from their native options parsing (perhaps this is "piggypacking"). Then perhaps PCMSolver additionally does a light str --> json or argparse --> json utility so it can be run independently.

Related discussion at https://github.com/MolSSI/QC_JSON_Schema , though admittedly Mol-centered at present

robertodr commented 7 years ago

Thanks @loriab. Admittedly, I'd prefer YAML (less curly braces around :smiley_cat:), but anything that is a standard format and would relieve me from doing most of the work in defining it, is very welcome.

These are the flows of the input I am envisioning:

  1. QC program needs to add a new input section for PCMSolver, with input parameters whose names and types are defined by the given version of the API.
  2. QC program reads input and dumps the PCMSolver section into a prescribed format, let's say JSON. Notice that the QC program does not validate the input, but simply translates it to a standard format. This can happen in one of two ways:
    1. Disk-based PCMSolver adds nhomann/json as dependency and exposes a function pcmsolver_write_input_file() that accepts the (presumably long) list of input parameters (all plain-old data structures - PODs) read by the QC program. This function would validate the input and write it to file in the correct format. A call to pcmsolver_new() would then bear no arguments and look for the already validated input file in the current directory.
    2. Software-based PCMSolver adds nhomann/json as dependency. I wrap it with a C API, in turn wrapped by a Fortran API. QC program calls pcmsolver_generate_input_json() that accepts the (presumably long) list of input parameters read by the QC program. This function would validate the input and return the JSON data structure for the input. Eventually, QC program calls pcmsolver_new(json_input) and gets a handle to a PCM computation.

QC program pseudocode for route i.

bool success = pcmsolver_write_input_file(/* PCM input parameters as PODs*/);
if (success) {
   pcmsolver_context * ctx = pcmsolver_new();
} else {
  // ABORT 
}

QC program pseudocode for route ii.

JSON * json_input = pcmsolver_generate_input_json(/* PCM input parameters as PODs */);
pcmsolver_context * ctx = pcmsolver_new(json_input);

Route ii. looks more elegant, but it will require more work in order to maintain the C and Fortran wrappers to the C++ JSON library.