Programatically access CalcJob dry_run folder

chrisjsewell commented 5 years ago

As discussed in #2768, running a calculation with the metadata.dry_run=True option should, in some way, provide a programatic link to the created submission folder, via the returned CalcJobNode. This is particularly useful for use in unit tests, e.g. to check that particular files are being created.

As @sphuber suggests, a simple way to achieve this would be to provide a node attribute, e.g.

results, node = launch.run.get_node(ProcessClass, a=Int(1), metadata={'dry_run': True})
node.dry_run_folder
>>  `aiida.common.folders.Folder object`

sphuber commented 5 years ago

If this is solely for the purposes of unit testing, I might have a better solution even for you. I recently added some infrastructure in the aiida-quantumespresso plugin to run unit tests for calculation job implementations using the FixtureManager and pytest. It mocks a sandbox folder and actually runs the CalcJob.prepare_for_submission of a calculation job class. The unit tests provides this sandbox folder and so can inspect its contents after. This accomplishes what you want without having to go through the dry_run way. I have a similar setup for the parsers and so I can test the full calculation job process with basic unit tests. N.B. the actual remote copying and symlinking is of course not tested here.

chrisjsewell commented 5 years ago

Yeh nice, that's what I was looking for ta.

As I've mentioned before, you might want to consider moving some of these more generic pytest fixtures into a module in aiida-core, so that plugins can load them, rather than copy/pasting code

sphuber commented 5 years ago

As I've mentioned before, you might want to consider moving some of these more generic pytest fixtures into a module in aiida-core, so that plugins can load them, rather than copy/pasting code

Absolutely, but since this is quite experimental, I would like to hold off a bit until we see that they have become useful for a number of plugins. Once we have introduced them, it will be difficult to change them without breaking backwards compatibility. After the breaking changes from 0.12 to 1.0.0 from now on, we would really like to keep those to an absolute minimum

zhubonan commented 5 years ago

@sphuber My CASTEP plugin also uses this functionality, but not just for unittesting. The idea is to actually to a 'dryrun' of the executable (by passing arguments -dryrun) on the local computer using the generated inputs. Hence a programmatically returned path to the dryrun folder would be needed.

This dryrun returns some useful information such as number of kpoints and estimated memory usage which then can be used to determine what resources should calculation use.

I wonder if the method you suggested above for testing is also suitable for running production environment?

zhubonan commented 5 years ago

I tested following the method in the QE test, which seems to work. The only downside is that that fixture leaves a process in the CREATED state. If I let dry_run=True then no process is created . Of course, I also have to set store_provenace=False. Below is the code I used. I wonder if there is any downside for this approach?

def submit_test(process_class, inputs=None):
    """This essentially test the submition"""
    from aiida.engine.utils import instantiate_process
    from aiida.manage.manager import get_manager
    from aiida.common.folders import SandboxFolder

    if isinstance(process_class, ProcessBuilder):
        inputs = dict(process_class)
        process_class = process_class._process_class

    upd = {'store_provenance': False, 'dry_run': True}
    # Create copy so the original ones are not affected
    inputs = dict(inputs)
    if 'metadata' in inputs:
        inputs['metadata'] = dict(inputs['metadata'])
        inputs['metadata'].update(upd)
    else:
        inputs['metadata'] = upd
    # In any case, we do not want to store the provenance

    folder = SandboxFolder(sandbox_in_repo=False)
    manager = get_manager()
    runner = manager.get_runner()
    process = instantiate_process(runner, process_class, **inputs)

    calc_info = process.prepare_for_submission(folder)
    return calc_info, folder, process

sphuber commented 5 years ago

So if I understand correctly, your CastepCalculation has some logic in the prepare_for_submission that based on the inputs will determine how many kpoints should be used and what resources. Before launching a real calculation, you first call it with a dry run and then use that to create the real inputs and launch a real calculation. To me that sound like a valid concept but just the wrong approach. Using the concept of a dry-run seems like the wrong way just in principle, regardless how it is implemented in AiiDA. Wouldn't it be possible and make more sense to abstract the code that determines the resources in separate utility functions? The prepare_for_submission can then still call it if needed and you can then call it isolated to determine the inputs for a calculation. This way you avoid going through all the complex machinery of launching a CalcJob "but not really, through a dry-run". But maybe I am missing something that wouldn't make this possible, in which case I would have to look at your code

zhubonan commented 5 years ago

Sorry maybe I did not explain well. This 'dryrun' is just an entirely optional check in addition to the submit_test and in the 0.12 version I can just call the the submit_test and take the returned folder and run the CASTEP executable using the generated input files. It is not called by prepare_for_submission. User can then set the resources according to the kpoints and memory usage if they want to.

I think I figured out the way to get around it in 1.0 now by following the test example you mentioned in QE. I am happy to have the code stay this way it is.

On 30 May 2019 21:43:06 Sebastiaan Huber notifications@github.com wrote:

So if I understand correctly, your CastepCalculation has some logic in the prepare_for_submission that based on the inputs will determine how many kpoints should be used and what resources. Before launching a real calculation, you first call it with a dry run and then use that to create the real inputs and launch a real calculation. To me that sound like a valid concept but just the wrong approach. Using the concept of a dry-run seems like the wrong way just in principle, regardless how it is implemented in AiiDA. Wouldn't it be possible and make more sense to abstract the code that determines the resources in separate utility functions? The prepare_for_submission can then still call it if needed and you can then call it isolated to determine the inputs for a calculation. This way you avoid going through all the complex machinery of launching a CalcJob "but not really, through a dry-run". But maybe I am missing something that wouldn't make this possible, in which case I would have to look at your code

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/aiidateam/aiida_core/issues/2884?email_source=notifications&email_token=AIBAYF4CLYF42RT3MAYXWJTPYA35NA5CNFSM4HNTZJV2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWTOZXY#issuecomment-497478879, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AIBAYF3MUQWJ4AUQCBCCUODPYA35NANCNFSM4HNTZJVQ.

sphuber commented 5 years ago

OK perfect, thanks. So if you don't mind, I will close this for now

giovannipizzi commented 5 years ago

I need this for a different reason (just printing to the user where the file has been created). I'm going to create a PR following the lines of the first post here.

aiidateam / aiida-core

Programatically access CalcJob dry_run folder #2884