Closed coltonbh closed 2 years ago
To take my thinking just one step further, would it be beneficial to expand the TaskConfig
object to include an .extra
field similar to AtomicInput
? As I've looked for places to put configuration variables that are not passed to my QC program but that direct my harness how to behave the suggestion has been to include them in AtomicInput.extras['my:kwargs']
. However, the separation of TaskConfig
params from the AtomicInput
params is sensible and both are passed to compute()
separately. What is the intended separation between these two objects? It seems AtomicInput
is generally intended for the QC program, TaskConfig
creates the context in which qcengine
executes a task (like keeping scratch files, or using more cores). Seems TaskConfig
might be the right place for config that directs a particular harness to take specific actions? Just brainstorming...
Looks like there is some cross-pollination of data from TaskConig
into the AtomicInput
model here: https://github.com/MolSSI/QCEngine/blob/aeead6fa1327917c7b9fd874479e013114b45b3e/qcengine/programs/psi4.py#L151-L157
This is getting nitty-gritty but just trying to document the use cases I'm finding.
To me this suggests that config for an end program should be moved to the AtomicInput
object inside of the Harness.compute()
function?
pinging @bennybp and @dotsdl for database and OpenFF implications.
from a quick read:
(1) Agree that local_options
is vague to me and the connection with TaskConfig is loose. I'd love to rename it. But, will have to see if the current name ties in closely with QCFractal nomenclature. Another concern is that many users of QCEngine don't use it directly, so a deprecationwarning may not filter back to them. Otoh, the qca stack tends to move in concert and only latest version of qcng active, so maybe a name switch not so disruptive. Ben and David can provide the necessary alarms. For the direct qcng users, I think FutureWarning works.
(2) I'm not up to refactoring to assuage mypy, but I definitely agree on making the function annotations better on the eyes and mind.
(TaskConfig) Extras tend to be a pain to manage (have to get merging vs. clobbering right at certain points), so I'm not immediately keen on adding it to a tightly controlled model like TaskConfig that often has direct counterparts in the controls set on the QC program (e.g., managed_memory control, so local_config memory active). Any chance that the controls you're thinking of adding to TaskConfig.extras are useful across programs? On the cross-pollination, yes, there's going to be a lot w/i harnesses. At the schema level, we try to keep things in their place: molecule info including charges in Molecule, what is run including modelchem in AtomicInput, how its run including memory in TaskConfig. But once it gets into formatting for the QCprog, it mixes. Sometimes molecular charge is in a molecule section of input; sometimes it's a keyword. Sometimes the modelchem is a task; sometimes it's a keyword. Sometimes memory is a keyword; sometimes it's a cmdline option. It may be helpful to see what the candidates are to figure out where they go.
Thanks for all the analysis and perspective!
Agree with the above!
After pushing my implementation further I agree that TaskConfig
should not be expanded. It is a qcng
specific object with duties at that level only. It seems cleanest to extract from it any necessary kwargs that happen to be used by a QC program and then pass those inside of AtomicInput.extras["myprog:config"]
rather than passing on the TaskConfig
to lower levels of the stack.
Agree the highest value adds are:
local_config
-> task_config
. I hear you on the warnings perhaps not being seen. Could do DepreciationWarning
for devs and UserWarning
for users etc. Could maintain the local_config
compatibility for a long time, if helpful. At least the source would be easier to understand for devs. Changing default var names is a big deal so I hear the concern. In this case the level of indirection seemed high to me so I'm bringing it up; hesitancy makes total sense.mypy
checks just to help devs.If others sign off on the suggestion I can make a PR. Open to other comments/feedback too.
PR looks good. I talked with @bennybp today, and he's good with the local_options
change.
Ok I'll add this too!
Do we want to do what I proposed and still support local_options
with a user/depreciation warning or just make the switch directly?
do the warning, please, unless it gets too messy. here's some phrasing if you'd like to use it: https://github.com/psi4/psi4/blob/master/psi4/driver/p4util/python_helpers.py#L480-L483 . maybe "as soon as version it will stop working" instead of "in version it will stop working".
Sounds good. Will get to this next week and finish it off :)
Is your feature request related to a problem? Please describe. The
qcng.compute()
call stack has a number of naming inconsistencies that could be improved (IMHO) to make the code paths easier to understand. This would help developers like myself to more easily create idiomatic harnesses and implement idiomatic usage of theqcengine
package in our own software stack.Specifically:
qcng.compute()
function has kwarglocal_options
.local_options
gets merged withinput_data.extras['_qcengine_local_config
and then passed toget_config()
as seen here.get_config()
acceptslocal_options
and creates aTaskConfig
object with defined attributes. The returnedTaskConfig
is passed toexecutor.comptue(input_data, config)
as aTaskConfig
object.local_options
is a dictionary representation ofTaskConfig
(later referenced in the call stack and harnesses asconfig
). I suggest that the dual naming of this configuration object (local_options
and later referencing it asconfig
ortask_config
makes it hard to understand what parameters are passed intoqcng.compute()
and consistent naming would be an improvement :)qcng.compute()
function annotation suggests it returnsAtomicInput
. In reality it may returnAtomicInput
,FailedOperation
, or a dictionary of either object as created byhandle_output_metadata
). To passmypy
the annotation would need to include all the values returned byhandle_output_metadata
which would includeOptimizationResult
objects as well since this same function handles values returned fromqcng.compute_procedure
too. Ideally,handle_output_metadata
should be refactored so that two specific functions return only the objects thatqcng.compute()
andqcng.compute_procedure()
should return.Describe the solution you'd like
local_options
kwarg forqcng.compute()
totask_config
. This clearly identifies the object associated with these values and suggests to the end user what values should live in this dictionary (the attributes of theTaskConfig
object). Changing the top-level kwarg breaks backwards compatibility so I suggest adding thetask_config
keyword, a user warning/depreciation warning on thelocal_options
keyword that can remain in place for a number of versions to alert users to the upcoming change, and then merginglocal_options
andtask_config
into a dictionary before passing it toget_config
to create theTaskConfig
object. This implementation maintains backwards compatibility (for a time) while creating more clarity about what values are passed in to the top-levelqcng.compute()
function.qcng.compute()
andqcng.compute_procedure()
to reflect the actual objects one expects these functions to return. This intel is extremely helpful for developers to know the cases they must handle for the various objects these functions may return :) These annotations would not passmypy
since thehandle_output_metadata
function may return values for eithercompute()
orcompute_procedure()
; however, having them correct in the source code would be beneficial.handle_output_metadata
into two/three functions that share common code but respect the true return values forcompute()
andcompute_procedures()
.Additional context I'd be happy to implement 1 and 2 above. I'd pass on refactoring
handle_output_metadata
as I haven't dug deep into the underlying function calls.@loriab Here are my thoughts.