Modifying the decorator of `ph_recover_job` in Espresso's `grid_phonon_flow` does not work

Quantum-Accelerators / quacc

quacc is a flexible platform for computational materials science and quantum chemistry that is built for the big data era.

https://quantum-accelerators.github.io/quacc/

BSD 3-Clause "New" or "Revised" License

167 stars 45 forks source link

Modifying the decorator of `ph_recover_job` in Espresso's `grid_phonon_flow` does not work #1852

Closed tomdemeyere closed 5 months ago

tomdemeyere commented 6 months ago

What would you like to report?

I find myself in the impossibility to run the quacc.recipes.espresso.phonons.grid_phonon_flow with custom decorators by doing:


grid_phonon_decorators = {
    "relax_job": job(executors=["surface"]),
    "ph_job": job(executors=["phonon"]),
}

grid_phonon_results = grid_phonon_flow(
     ...
     job_decorators=grid_phonon_decorators,
)

Parsl will throw: parsl.errors.NoDataFlowKernelError: Must first load config, other basic jobs that I redecorate manually with redecorate work fine, any thoughts here?

Andrew-S-Rosen commented 6 months ago

Thank you for the report, @tomdemeyere! I haven't seen this behavior before, and it's not immediately obvious to me yet what the root cause is.

I tried a minimal example as a sanity check:

import parsl
from ase.build import bulk
from quacc.recipes.emt.slabs import bulk_to_slabs_flow
from quacc import job

parsl.load()

atoms = bulk("Cu")
future = bulk_to_slabs_flow(atoms,job_decorators={"relax_job": job(), "static_job": job()})
future.result()

This, thankfully, works as expected. I can probably help debug further with a minimal (local) example, but at the moment the cause of the issue evades me.

I have not tried the multi-executor behavior in Parsl before, so that might be an avenue to double-check.

tomdemeyere commented 6 months ago

Ah! your example works on my machine as well.

Although, changing bulk_to_slabs_flow to the grid_phonon_flow still fails for me (same error as before), can you try to reproduce it?

Andrew-S-Rosen commented 6 months ago

If you provide a minimal reproducible example, I'd be happy to run it and try to sort out the root cause. 👍

tomdemeyere commented 6 months ago


import parsl
from ase.build import bulk
from quacc import job
from quacc.recipes.espresso.phonons import grid_phonon_flow

parsl.load()

atoms = bulk("Cu")
future = grid_phonon_flow(
    atoms,
    job_params={
        "relax_job": {"occupations": "smearing", "degauss": 0.01, "smearing": "cold"}
    },
    job_decorators={"relax_job": job(), "ph_init_job": job()},
)
future.result()

Should throw something like: TypeError: 'AppFuture' object is not callable

(Also it seems that there is a wrong type hint in _grid_phonon_subflow?)


    @subflow
    def _grid_phonon_subflow(
        ph_input_data: dict | None,
        ph_init_job_results: str | Path, # <<< that is probably not correct?
        ph_job: Job,
        nblocks: int = 1,
    ) -> list[RunSchema]:

Andrew-S-Rosen commented 6 months ago

@tomdemeyere --- I think I fixed it? Could you please try again with main and let me know how it goes?

I am going to close this for now since I think it's good. If not, please re-open and ping me!

tomdemeyere commented 6 months ago

Sorry for the delay I have plenty to do these days

TypeError: 'AppFuture' object is not callable seems to be fixed, although I still get parsl.errors.NoDataFlowKernelError: Must first load config having a quick look it seems that some jobs are not launching on the correct executor, parsl says

Task 50 will be sent to executor pw
Task 50 submitted for App _ph_recover_job

But I did define the executors

grid_phonon_decorators = {
    "relax_job": job(executors=["pw"]),
    "ph_init_job": job(executors=["ph_init"]),
    "ph_job": job(executors=["ph"]),
    "ph_recover_job": job(executors=["ph_recover"]),
}

I will find time to send a minimal example, although this is simply the previous one that I sent but with specified executors

Andrew-S-Rosen commented 6 months ago

Thanks for the follow-up! I'll reopen this given that comment. I will try to look at this when I have time as well.

Andrew-S-Rosen commented 5 months ago

@tomdemeyere: Regarding #1182, I would like to ensure that it is indeed the ideal way to fix this issue, and I am not yet convinced it is.

If you can prepare a complete, minimal reproducible example that I can run locally with Parsl that would be ideal. The example you shared requires setting some executors, and I don't quite know how to do that offhand. I would like to investigate this issue further. It's clear it's related to the fact that you have modified the phonon_job's decorator instead of the _ph_recover_job's decorator, but the only way I can be sure that I am going down the right path is to have as minimal an example as possible to reproduce things on my local Linux machine. Thanks!

To be clear, I don't think this is really a Parsl decorator issue or a customize_funcs issue. If you tried to modify the _ph_recover_job's parameters (if it had ones to modify), you'd likely run into a similar error because you are not modifying the proper function. So, I want to make sure we fix the root cause rather than a bandaid solution. The @subflow approach you took "works" but in a very roundabout way.

tomdemeyere commented 5 months ago

Using ThreadPoolExecutor it seems that Parsl does not print which executor is used for which task, you might be able to see it by placing smart print statement inside Quacc?

import parsl
from ase.build import bulk
from parsl.config import Config
from parsl.executors import ThreadPoolExecutor
from quacc import job
from quacc.recipes.espresso.phonons import grid_phonon_flow

CONFIG = Config(
    executors=[
        ThreadPoolExecutor(
            label="pw",
            max_threads=1,
        ),
        ThreadPoolExecutor(
            label="ph_init",
            max_threads=1,
        ),
        ThreadPoolExecutor(
            label="ph",
            max_threads=1,
        ),
        ThreadPoolExecutor(
            label="ph_recover",
            max_threads=1,
        ),
    ],
)

pw_input_data = {
    "control": {"forc_conv_thr": 0.0001},
    "system": {
        "occupations": "smearing",
        "degauss": 0.01,
        "smearing": "cold",
    },
    "electrons": {"conv_thr": 1e-08, "mixing_mode": "TF", "mixing_beta": 0.7},
    "ions": {},
    "cell": {"press_conv_thr": 0.1},
}

ph_input_data = {
    "inputph": {
        "tr2_ph": 1.0e-8,
        "alpha_mix(1)": 0.1,
        "nmix_ph": 8,
        "ldisp": True,
        "nq1": 1,
        "nq2": 1,
        "nq3": 1,
    },
}

grid_phonon_params = {
    "relax_job": {
        "input_data": pw_input_data,
        "kspacing": 1.0,
        "relax_cell": False,
    },
    "ph_job": {
        "input_data": ph_input_data,
    },
}

parsl.load(CONFIG)

grid_phonon_decorators = {
    "relax_job": job(executors=["pw"]),
    "ph_init_job": job(executors=["ph_init"]),
    "ph_job": job(executors=["ph"]),
    "ph_recover_job": job(executors=["ph_recover"]),
}

future = grid_phonon_flow(bulk("Si", cubic=True), job_params=grid_phonon_params, job_decorators=grid_phonon_decorators)

future.result()

Andrew-S-Rosen commented 5 months ago

Thanks, @tomdemeyere. It is still a bit difficult for me to debug with this because running the code as-is yields a successful output. I'm happy to debug this if a minimal example is provided that clearly raises an error that would be resolved with a patch.

tomdemeyere commented 5 months ago

Strictly speaking there is no error anymore, Parsl just use the wrong executor... let me think about it