SUGGESTION: Cohere `qcng.compute()` call stack naming consistencies and annotations to improve clarity

coltonbh commented 2 years ago

Is your feature request related to a problem? Please describe. The qcng.compute() call stack has a number of naming inconsistencies that could be improved (IMHO) to make the code paths easier to understand. This would help developers like myself to more easily create idiomatic harnesses and implement idiomatic usage of the qcengine package in our own software stack.

Specifically:

qcng.compute() function has kwarg local_options. local_options gets merged with input_data.extras['_qcengine_local_config and then passed to get_config() as seen here. get_config() accepts local_options and creates a TaskConfig object with defined attributes. The returned TaskConfig is passed to executor.comptue(input_data, config) as a TaskConfig object. local_options is a dictionary representation of TaskConfig (later referenced in the call stack and harnesses as config). I suggest that the dual naming of this configuration object (local_options and later referencing it as config or task_config makes it hard to understand what parameters are passed into qcng.compute() and consistent naming would be an improvement :)
qcng.compute() function annotation suggests it returns AtomicInput. In reality it may return AtomicInput, FailedOperation, or a dictionary of either object as created by handle_output_metadata). To pass mypy the annotation would need to include all the values returned by handle_output_metadata which would include OptimizationResult objects as well since this same function handles values returned from qcng.compute_procedure too. Ideally, handle_output_metadata should be refactored so that two specific functions return only the objects that qcng.compute() and qcng.compute_procedure() should return.

Describe the solution you'd like

Rename the top-level local_options kwarg for qcng.compute() to task_config. This clearly identifies the object associated with these values and suggests to the end user what values should live in this dictionary (the attributes of the TaskConfig object). Changing the top-level kwarg breaks backwards compatibility so I suggest adding the task_config keyword, a user warning/depreciation warning on the local_options keyword that can remain in place for a number of versions to alert users to the upcoming change, and then merging local_options and task_config into a dictionary before passing it to get_config to create the TaskConfig object. This implementation maintains backwards compatibility (for a time) while creating more clarity about what values are passed in to the top-level qcng.compute() function.
Change the function annotations on qcng.compute() and qcng.compute_procedure() to reflect the actual objects one expects these functions to return. This intel is extremely helpful for developers to know the cases they must handle for the various objects these functions may return :) These annotations would not pass mypy since the handle_output_metadata function may return values for either compute() or compute_procedure(); however, having them correct in the source code would be beneficial.
Ideally someone would refactor handle_output_metadata into two/three functions that share common code but respect the true return values for compute() and compute_procedures().

Additional context I'd be happy to implement 1 and 2 above. I'd pass on refactoring handle_output_metadata as I haven't dug deep into the underlying function calls.

@loriab Here are my thoughts.

coltonbh commented 2 years ago

To take my thinking just one step further, would it be beneficial to expand the TaskConfig object to include an .extra field similar to AtomicInput? As I've looked for places to put configuration variables that are not passed to my QC program but that direct my harness how to behave the suggestion has been to include them in AtomicInput.extras['my:kwargs']. However, the separation of TaskConfig params from the AtomicInput params is sensible and both are passed to compute() separately. What is the intended separation between these two objects? It seems AtomicInput is generally intended for the QC program, TaskConfig creates the context in which qcengine executes a task (like keeping scratch files, or using more cores). Seems TaskConfig might be the right place for config that directs a particular harness to take specific actions? Just brainstorming...

coltonbh commented 2 years ago

Looks like there is some cross-pollination of data from TaskConig into the AtomicInput model here: https://github.com/MolSSI/QCEngine/blob/aeead6fa1327917c7b9fd874479e013114b45b3e/qcengine/programs/psi4.py#L151-L157

This is getting nitty-gritty but just trying to document the use cases I'm finding.

To me this suggests that config for an end program should be moved to the AtomicInput object inside of the Harness.compute() function?

loriab commented 2 years ago

pinging @bennybp and @dotsdl for database and OpenFF implications.

from a quick read:

(1) Agree that local_options is vague to me and the connection with TaskConfig is loose. I'd love to rename it. But, will have to see if the current name ties in closely with QCFractal nomenclature. Another concern is that many users of QCEngine don't use it directly, so a deprecationwarning may not filter back to them. Otoh, the qca stack tends to move in concert and only latest version of qcng active, so maybe a name switch not so disruptive. Ben and David can provide the necessary alarms. For the direct qcng users, I think FutureWarning works.

(2) I'm not up to refactoring to assuage mypy, but I definitely agree on making the function annotations better on the eyes and mind.

(TaskConfig) Extras tend to be a pain to manage (have to get merging vs. clobbering right at certain points), so I'm not immediately keen on adding it to a tightly controlled model like TaskConfig that often has direct counterparts in the controls set on the QC program (e.g., managed_memory control, so local_config memory active). Any chance that the controls you're thinking of adding to TaskConfig.extras are useful across programs? On the cross-pollination, yes, there's going to be a lot w/i harnesses. At the schema level, we try to keep things in their place: molecule info including charges in Molecule, what is run including modelchem in AtomicInput, how its run including memory in TaskConfig. But once it gets into formatting for the QCprog, it mixes. Sometimes molecular charge is in a molecule section of input; sometimes it's a keyword. Sometimes the modelchem is a task; sometimes it's a keyword. Sometimes memory is a keyword; sometimes it's a cmdline option. It may be helpful to see what the candidates are to figure out where they go.

Thanks for all the analysis and perspective!

coltonbh commented 2 years ago

Agree with the above!

After pushing my implementation further I agree that TaskConfig should not be expanded. It is a qcng specific object with duties at that level only. It seems cleanest to extract from it any necessary kwargs that happen to be used by a QC program and then pass those inside of AtomicInput.extras["myprog:config"] rather than passing on the TaskConfig to lower levels of the stack.

Agree the highest value adds are:

local_config -> task_config. I hear you on the warnings perhaps not being seen. Could do DepreciationWarning for devs and UserWarning for users etc. Could maintain the local_config compatibility for a long time, if helpful. At least the source would be easier to understand for devs. Changing default var names is a big deal so I hear the concern. In this case the level of indirection seemed high to me so I'm bringing it up; hesitancy makes total sense.
Changing function annotations without concern for mypy checks just to help devs.

If others sign off on the suggestion I can make a PR. Open to other comments/feedback too.

loriab commented 2 years ago

PR looks good. I talked with @bennybp today, and he's good with the local_options change.

coltonbh commented 2 years ago

Ok I'll add this too!

coltonbh commented 2 years ago

Do we want to do what I proposed and still support local_options with a user/depreciation warning or just make the switch directly?

loriab commented 2 years ago

do the warning, please, unless it gets too messy. here's some phrasing if you'd like to use it: https://github.com/psi4/psi4/blob/master/psi4/driver/p4util/python_helpers.py#L480-L483 . maybe "as soon as version it will stop working" instead of "in version it will stop working".

coltonbh commented 2 years ago

Sounds good. Will get to this next week and finish it off :)

MolSSI / QCEngine

SUGGESTION: Cohere `qcng.compute()` call stack naming consistencies and annotations to improve clarity #348