Open zulissimeta opened 1 week ago
Can one of the admins verify this patch?
Thanks! Let me first address one of your points/questions before proceeding:
I don't quite understand the logic in change_settings_wrap, so there's probably some overlap / considerations there if it was separately specified for a function (and which should take priority)
The function is below.
The important bits are lines 591-594. There, we are basically turning
def add(a, b):
return a + b
into
from quacc import change_settings
def add(a, b):
with change_settings(changes):
return a + b
This is necessary because with change_settings()
is modifying the thread-based, in-memory global settings. Since this is a modification to a variable in-memory, it needs to be "shipped" to the remote machine and therefore included within the function definition itself. The other lines in the function definition can be ignored, as they are specific to getting it to work with Parsl.
So, the with change_settings()
will create a newly modified function, but where is this called? That is done in the definition of the job
function below:
In short, if a user passes @job(settings_swap=changes)
where changes: dict[str, Any]
(e.g. changes={"RESULTS_DIR": "~/test"}
), this will:
@task
for Prefect).with change_settings(settings):
context manager within the original function definition.@task
for Prefect).Now we have successfully changed the settings within the @task
-decorated function itself.
the _changed
attribute was a nuance that probably isn't worth thinking much about here tbh. An external contributor wanted to make sure that double-application of change_settings took the outer one as priority.
with change_settings({"RESULTS_DIR": "~/test2"}):
with change_settings({"RESULTS_DIR": "~/test1"}):
static_job(atoms)
this came up in practice because they redecorated the same function twice without renaming it:
static_job = redecorate(static_job, job(settings_swap={"RESULTS_DIR": "~/test1"}))
static_job(atoms)
static_job = redecorate(static_job, job(settings_swap={"RESULTS_DIR": "~/test2"}))
static_job(atoms)
this _changed
business was to get it to work correctly, with Parsl specifically I believe.
but this is an edge case and a scenario of "the user shouldn't technically do that", so I told them if it caused issues it may be removed. in any case, not much to worry about at the moment.
I'll take care of other workflow engines as needed. if tests pass, then we're fine. I added a lot of tests for the settings handling.
Attention: Patch coverage is 92.50000%
with 3 lines
in your changes missing coverage. Please review.
Project coverage is 98.98%. Comparing base (
9d91773
) to head (75d249c
).
Files | Patch % | Lines |
---|---|---|
src/quacc/wflow_tools/decorators.py | 88.46% | 3 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@zulissimeta: Thanks for this PR! I will have to try it out, but looking at the code and your writeup, this seems logical to me. I do not have major objections (at this time).
As for this PR, next steps would be:
- Figure out why the tests are failing and update the code accordingly. We have three tests failing in
tests/prefect/test_customizers.py
.- Address the (minor) comments in my review.
- Add tests for this newly introduced behavior so we can ensure we don't break it.
- Extend this to other workflow engines (this is for me to do).
Done! (except for 4.)
As for your "downsides":
Passing around wrapped functions increases complexity
It is true that wrapping functions increases complexity. That said, we already are wrapping functions once if
change_settings_wrap
is applied. Additionally, even without thechange_settings
business, we were already wrapping in a few places depending on the workflow engine (e.g. for Prefect when we havePREFECT_AUTO_SUBMIT
asTrue
). So, here we have either single or double wrapping depending on the workflow engine. That should hopefully be okay because we are already doing that in the test suite as-is. You haven't really increased the complexity from a wrapping standpoint in this PR.
After thinking some more, I made it so the change_settings_wrap keeps a reference to the changes, both so that the nested directory wrapper knows when not to set the new directory (if it was manually specified), and so that it can just be added to the other changes.
You can only define the task/flow/etc at runtime in the flow, which could lead to some downstream weird logic issues
I don't think I understand this point. Could you clarify?
If you don't have this, you can definitely schedule a bunch of future jobs (say a chain of dependent jobs). With this change, the function actually has to be called with each input in the calling flow.
I'm not aware of a concrete problem here, but making it so that the wrapped function 100% has to be called to get it to add the right context manager might lead to problems later. For example, in prefect there's a job.map() that might not explicitly call functions, and instead just schedule them directly on the server.
Prefect throws some warnings that different functions with the same name are being used (it still works)
Is this a new warning, or was it always present? Shouldn't the name of the function not change if we use
@wraps
? I'm curious what happened here to cause that metadata change.
This is a new warning that will happen for any wrapped function (not just in my PR, but with change_settings too). Wrapped functions are different functions, but have the same name, which prefect is warning about.
This seems logical. I did some refactoring to reduce code duplication.
Before this can be merged, the following will need to be addressed:
NESTED_RESULTS_DIR
is True
by default, the no-wrapped cases are not being tested anymore.I can work on this soon but might be a bit slower than usual, as I'm getting ready to move across the country this weekend. ✈️ Although, certainly feel free to chip in if you have time and are interested. Either way, hopefully, I can get this merged next week. I quite like the idea. If there is any weirdness we stumble upon, we can set it to False
by default for a trial period without much concern.
Summary of Changes
This is a draft PR to organize the RESULTS_DIR as a nested directory based on the DAG of the calling functions. For example, running the EMT example in the docs:
with the new setting
QUACC_NESTED_RESULTS_DIR=True
will lead to a RESULTS_DIR with layout:This is only a demo to inspire discussion, and is only implemented for prefect right now (though others would be easy to add!).
A couple of potential downsides: