`VaspCalculation` defaults to not parse anything, nor to save any files

zhubonan commented 4 years ago

Please check these boxes before submitting your issue, thank you!

[x] aiida-vasp and aiida-core are installed in a virtualenv that is used for nothing else
[x] if you are installing from the repo: pull the latest changes aiida-core and aiida-vasp
[x] remove all *.pyc files and restart the daemon, then retry

Environment

Operating system: WSL - Ubuntu 64
aiida-core version/branch+commit: 1.1.0
aiida-vasp version/branch+commit: dev, d089ed72ef

Steps that lead to the error (not an error)

Run an explicit VaspCalculation

What should happen

Plugins run the calculation, results get retrieved and parsed, and the provenance is preserved.

What happened instead

It appears the default behaviour of the parser (not setting any parser_settings) is to not parse anything. Since all the files are in the temporary retrieve list, they are deleted locally. So the CalcJobNode does not have any meaningful output. I think it would be much nicer if the default is to parse the basic properties, and the essential output files are retrieved and save locally. These can always be overridden in scenarios space-saving is more important.

zhubonan commented 4 years ago

It appears the default is to parse the energy and maximum force, but somehow no nodes are created on my end. This may be a local issue on my machine. I am investing why.

espenfl commented 4 years ago

@zhubonan Thanks for the report. In fact this is intentional. The plan was that the main entrypoint to VASP should never be the VaspCalculation but the VaspWorkChain (or after we have the workchain stack more complete a higher lying one). There, the nodes are created and other sensible defaults set. This design choice is of course up for discussion. This was also done before the calculation became a process etc. Back then it made total sense to do it like this. Now, with the concept of a unified process this difference is not that important and we could attach the nodes also on the calculation. However, if we envision users to have direct access to a fully functional calculation we need to make a few other changes as well. For instance for some of the metadata.

zhubonan commented 4 years ago

Thanks for your reply.

I have located the problem for not having any output nodes. It is not because of the default parser settings (which does parse the energies e.t.c). In fact, it is because the VaspCaculation does not properly set its default parser, which is only done at the WorkChain level. Not sure if it is intentional though. The BaseCalcultion does set a _default_parser='vasp.vasp', which I recall is the pre-1.0 way. The 1.0 way is to use `self.input('metadata.options.parser_name', default='vasp.vasp').

I understood that now the preferred way is to launch WorkChain, but it would still be good to have the basic CalcJobs runnable out-of-the-box.

espenfl commented 4 years ago

Another reason why we prefer to use the workchain is the problem when using expose_inputs etc. and setting default metadata, which gives problems at the workchain level. Setting that at the workchain that calls the calculation avoids these issues. There are ways around this but it seems a bit dirty.

We will soon make a new release that is more production ready. In that process I will have a look at this again.

zhubonan commented 4 years ago

Ahh, I see. I recall getting a similar issue when developing the CASTEP WorChains, which I got around by exposing the inputs of lower-level processes into a different namespace so their metadata can be set separately. It is indeed a bit messy, not sure if there is a better way now as I did not follow the issue.

Perhaps it is worth letting the CalcJob check if there is a parser_name in its options while preparing the inputs. If there is isn't one, it should raise an exception for warning... It is a bit pointless to run the calculation and not parse any results, plus the default is to have all files in the temporary retrieve list, hence nothing is saved.

BTW is there any reason to have all files in the temporary retrieve list by default? It could be common to use many other existing VASP postprocessing tools for analysing the results and plotting, especially when one just started using AiiDA. The files from AiiDA repo can always be deleted/moved if the disk space is running low.

espenfl commented 4 years ago

Yes, and that is the main reason why I have not prioritized fixing the VaspCalculation so that it works as a stand alone entity. I am happy to do something like this, but would still like us to word the documentation such that we recommend users to use the workchains as entry points. For most use cases I believe this is what we want to do. At some point I will also include the stuff I have for the verifications and error handlers etc. You really need this for all calculations (even the most basic VASP executions) and it make sense to put this in a workchain and not in the calculation.

With respect to the temporary list, I have no strong opinion on the default. Either is fine with me. However, the principle of not storing data twice should be followed as much as possible. E.g. when one only wants the band structure and make AiiDA pull that out and still keep vasprun.xml, EIGENVAL and/or OUTCAR it makes things too redundant. But for some users, especially in a transition period until we have a better toolset for post processing in AiIDA it might be useful to break this principle.

So to summarize, I agree, let us check parse_name and let the default be that the files are put in the retrieve list. However, let us make a flag for the workchain, that makes it easy to move all the files to temporary without having to go down low in the code. I can fix these latter aspects later today. Thanks for PR #321 on the parse_name etc.

zhubonan commented 4 years ago

Thanks for the comments. I will also push changes to the PR accordingly!

espenfl commented 4 years ago

@zhubonan Great. Thanks. Looks like I will not make it today, but will certainly have a look tomorrow.

zhubonan commented 4 years ago

@espenfl No worries, take your time 👍

espenfl commented 4 years ago

@zhubonan Done. Maybe you can rebase your PR on top of #322. That should now fix the storage of the parsed files by default.

aiida-vasp / aiida-vasp