aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
433 stars 186 forks source link

Interactive usage #4974

Closed louisponet closed 2 years ago

louisponet commented 3 years ago

As suggested by @giovannipizzi I've taken a part of the demo that I gave in my show and tell that can be used to see how this basic interactive usage would be translated into AiiDA. Since my code revolves mostly around QE I'm not sure if this issue belongs here or rather in the aiida-QE repository. In any case, the features shown here are probably also useful for interactive use of other plugins.

@mbercx @ramirezfranciscof

using DFControl
using Downloads

cif_file = Downloads.download("http://www.crystallography.net/cod/9011998.cif", "Si.cif")

str = Structure(cif_file, name="Si")
set_pseudos!(str, :pbesol)

pw_execs = [Exec("mpirun", "", :np => 4), Exec("pw.x", "/opt/qe/bin/", :nk => 4)]

scf_input = DFInput{QE}("scf", pw_execs, :calculation => "scf")
set_kpoints!(scf_input, (6,6,6,1,1,1))

job = DFJob("Si", str, [scf_input],
            :ecutwfc => 20, #these flags will be set_ on all inputs that are passed to the job
            :verbosity => "high",
            :conv_thr => 1e-6,
            header=["export OMP_NUM_THREADS=1"])

set_localdir!(job, "Si_NSOC")
submit(job)

out = outputdata(job)
out = outputdata(job["scf"]) # or outputdata(job)["scf"]

push!(job, gencalc_bands(job["scf"], high_symmetry_kpath(str, 20)))

job["scf"].run = false
#or
set_flow!(job, "" => false, "bands"=>true)
submit(job)

set_localdir!(job, "Si_SOC")
job[:noncolin] = true
job[:lspinorb] = true
job["bands"][:nbnd] = 30
set_pseudos!(job, :pbesolrel)
set_flow!(job, "" => true)
submit(job)

job_nsoc = DFJob("Si_NSOC")

using Plots

plot(plot(job, -5, 5), plot(job_nsoc, -5, 5))

fermi_nsoc = readfermi(job_nsoc)
bands_nsoc = readbands(job_nsoc)

fermi_soc = readfermi(job)
bands_soc = readbands(job)

plot(bands_nsoc, fermi=fermi_nsoc, ylims=[-5,5], linewidth=1, color=:red) 
plot!(bands_soc, fermi=fermi_soc, ylims=[-5,5], linewidth=1, color=:red) 

in order to test this on your local machine you can first run:

using Pkg
Pkg.add(name="DFControl", rev="master")
Pkg.add(["Plots", "Downloads"])

using DFControl
setdefault_pseudodir(:pbesol, "/path/to/pbesol/pseudos")
setdefault_pseudodir(:pbesolrel, "/path/to/pbesolrel/pseudos")
configuredefault_pseudos()

Let me know if you need any help! Cheers, Louis

mbercx commented 3 years ago

Thanks @louisponet! Since the goal here is mainly to see if there are any user-friendly aspect we can integrate in AiiDA (i.e. steal 😁 ), I think it's fine to keep the issue here.

If I remember correctly though, I think the idea was that we try and replicate the above script with AiiDA to see what is "missing" that could be valuable. In this context, there were two aspects we discussed that I remember:

Very happy to get started with Julia at some point and try out your code. Let's touch base on this when we see each other at the office!

zhubonan commented 3 years ago

This is interesting, and somehow I got mentioned. I think the corresponding AiiDA code can be achieved using mainly the get_builder_restart method.

However, it is less trival to override parameters as the corresponding Dict node might have been stored already and hence is not modifiable. I frequently found myself cloning them and converting them back and forth from plain python dictionary within interactive shell. Perhaps this can be solved by some helper class?

giovannipizzi commented 3 years ago

Thanks to all!

Indeed @mbercx it would be good to try to convert this to a AiiDA python script, to see if the same goal can be achieved with the same level of coding complexity (no need to rerun in Julia I think, @louisponet can help telling us what the code is doing). We can thus identify the points where the AiiDA logic can be simplified.

I think a few important improvements could be (not sure how they would work though):

If we manage to get a prototype syntax of how this would work, we can then see how to implement it and try it out in various real-life scenarios (this would make sense only if it makes running these workflows very simple, of course! - and not for all usecases, but at least for a subset of very common usecases).

Pinging @dev-zero @greschd @sphuber @muhrin @chrisjsewell @ramirezfranciscof in case someone has some clever idea for a first proof of concept

chrisjsewell commented 3 years ago

a very simple way in 1 line to setup a simple linear workflow

Well I would suggest maybe first working backwards from what we already have, rather than trying to come up with what appears to be an extremely complex implementation from scratch, e.g. it shouldn't be too difficult to generate a workflow from a set of calcfunctions:

@calcfunction
def step1(x: orm.Int, y: orm.Float) -> orm.Dict:
    ...
    return result

@calcfunction
def step2(z: orm.Dict) -> orm.Dict:
    ...
    return result

create_workchain([step1, step2])

Here you could check that each step is consistent in its output -> inputs, and introspect the workchain specification.

zhubonan commented 3 years ago

@giovannipizzi Sure. Perhaps I can just drop in one of the weekly meetings? I can probably do it next week or the week after. The documentation I wrote was not comprehensive at all and it assumes the reads knows how fireworks operates. Setting up the latter requires additional setups as fireworks itself requires a standalone "bridge" MongoDB server (accessible to both the user and the remote) to be up and running as well.

zhubonan commented 3 years ago

I recall that aiida-optimize has implemented some protocol for dynamically "concatenate" the work chains together. It may be something easier for interactive usage, say, concatenate SCF and NSCF calculations. I have never tried it though.

mbercx commented 3 years ago

@giovannipizzi Sure. Perhaps I can just drop in one of the weekly meetings? I can probably do it next week or the week after. The documentation I wrote was not comprehensive at all and it assumes the reads knows how fireworks operates. Setting up the latter requires additional setups as fireworks itself requires a standalone "bridge" MongoDB server (accessible to both the user and the remote) to be up and running as well.

Sounds great! I have some experience with Fireworks as well, so would be happy to help in providing documentation for this and making the barrier to using it as low as possible. I used to love just taking over a cluster with me and my students by just flooding the queue with qlaunch jobs of which I had no clue what they were going to do yet. 😁

greschd commented 3 years ago

I recall that aiida-optimize has implemented some protocol for dynamically "concatenate" the work chains together. It may be something easier for interactive usage, say, concatenate SCF and NSCF calculations. I have never tried it though.

Indeed, a short description is here: https://github.com/aiidateam/aiida-core/discussions/4780

The basic idea there is that the workflow steps are passed as inputs, instead of being hard-coded in the code. It does work decently well in interactive use - but one drawback is that there isn't a clear path from this form of workflow to a "normal" workchain.

The benefit of this approach is that it works without any changes to aiida-core.

a very simple way in 1 line to setup a simple linear workflow

It doesn't fulfill the one-liner requirement, but other than that it does match the use case you describe there.

louisponet commented 3 years ago

While I think the quick linear workflow thing is useful, I think the main important question users is how easy/hard it is to accomplish the basic tasks outlined in the demo above. It would be nice to come up with a "translated" version for AiiDA. I'd be happy to sit down/have a quick zoom call with someone of the devs to go through it and do that.

sphuber commented 3 years ago

I am not sure what sparked this discussion and what gives the impression that AiiDA cannot be used interactively in a user friendly manner, but I had a quick stab at converting the example script. Here is a quick example of how the script in the OP can be replicated with aiida-quantumespresso. Note that I haven't included the plotting since I am not too familiar with how that works. Someone else could maybe add this part. Of course it also assumes that the Computer and Code have been setup. The required pseudos are installed with a single command aiida-pseudo install sssp. That is all the setup that is required to be able to run this.

import io
import urllib

from aiida import engine
from aiida import orm
from aiida import plugins

CifData = plugins.DataFactory('cif')
PwBandsWorkChain = plugins.WorkflowFactory('quantumespresso.pw.bands')

stream = io.BytesIO(urllib.request.urlopen('http://www.crystallography.net/cod/9011998.cif').read())
cif = CifData(file=stream)
structure = cif.get_structure()
code = orm.load_code('bash@localhost')

options = {
    'custom_scheduler_commands': 'export OMP_NUM_THREADS=1',
    'resources': {
        'num_machines': 1,
        'num_mpiprocs_per_machine': 4,
    }
}
overrides = {
    'scf': {
        'pw': {
            'metadata': {'options': options},
            'parallelization': {'npool': 4}
        },
    },
    'bands': {
        'pw': {
            'metadata': {'options': options},
            'parallelization': {'npool': 4}
        },
    }
}
builder = PwBandsWorkChain.get_builder_from_protocol(code, structure, overrides=overrides)
results_soc, node_soc = engine.run_get_node(builder)

parameters_nsoc = {
    'SYSTEM': {
        'noncolin': True,
        'lspinorb': True,
        'nbnd': 30
    }
}
overrides['scf']['pw']['parameters'] = parameters_nsoc
overrides['bands']['pw']['parameters'] = parameters_nsoc
builder = PwBandsWorkChain.get_builder_from_protocol(code, structure, overrides=overrides)
results_nsoc, node_nsoc = engine.run_get_node(builder)

fermi_soc = results_soc['band_parameters']['fermi_energy']
fermi_nsoc = results_nsoc['band_parameters']['fermi_energy']
bands_soc = results_soc['band_structure']
bands_nsoc = results_nsoc['band_structure']

# Now plot something with the results above
louisponet commented 3 years ago

This is fantastic thank you!

what gives the impression that AiiDA cannot be used interactively in a user friendly manner

During my show and tell this was the basic premise and not a lot of disagreement was put forth, moreover it seems that not many people are using AiiDA in their day to day work save for the core developers or active workflow developers. The main point I'm trying to make is that AiiDA doesn't do a lot to improve the tedious tasks that one has to do to run (DFT) calculations. Another point is that it's hard to understand what went wrong as it inevitably always does, e.g. make a mistake in the mpirun command in setting up a computer and you're lost.

To come back to the above script, from a user friendliness point of view:

If this is indeed the intended way to do things, it's quite clear that novice users don't get to this stage easily either because the documentation is missing, or tutorials do things differently.

Additionally one of the difficulties is that things work relatively fine if there's a workchain implemented and does what you want it to do, if not it suddenly requires an incredible increase in AiiDA knowledge to start playing around with adding things to workchains etc.

e.g. all that's needed to add and run an nscf and projwfc calculation to the "workflow" in the OP is

push!(job, gencalc_nscf(job["scf"], (6,6,6)))
push!(job, gencalc_projwfc(job["nscf"], -20, 10, 0.1))
# could not choose to run again scf and bands by
set_flow!(job, "" => false, "nscf" => true, "projwfc" => true)
submit(job)

Quickly adding additional random inputs/calcjobs and/or using previous results as the starting point for those is something I don't think is easy in AiiDA. It would mean first add an additional code through some verdi commands, load that code, create a new workchain (I think), copy paste part of the code for the PWBands workchain and add some additional steps in there. Reentry scan, reset daemon, etc ... That's not very user friendly.

Additionally, using the exact same plot(job, -5,5) command will now discover the projwfc result, generate the dos and color the bands accordingly, these kinds of relatively simple things help users tremendously.

Just to make sure I'm not misunderstood: I'm just trying to be critical and highlight issues that I see myself having trying to use it for my actual physics projects, and what I believe is why people generally resort to the oldschool script + inputs way of running codes. I think pulling that public into using AiiDA would be a big step forward.

louisponet commented 3 years ago

By the way, I don't think the AiiDA script will actually run because no pseudos are assigned. Does aiida-pseudo have pseudos that can be ran with SOC calculations? That's another thing but probably best to not clutter the discussion here, installing a pseudo set that is not in aiida-pseudo database is not particularly trivial either if I remember correctly.

sphuber commented 3 years ago

To start with your last comment:

Just to make sure I'm not misunderstood: I'm just trying to be critical and highlight issues that I see myself having trying to use it for my actual physics projects, and what I believe is why people generally resort to the oldschool script + inputs way of running codes. I think pulling that public into using AiiDA would be a big step forward.

I realize that and I appreciate the feedback. If there are ways we can make AiiDA easier to use, without hurting its generic strengths and usability, I am all for it. I don't think anyone is looking to intentionally make AiiDA less user friendly or more complicated to use than necessary. There are two important general points that I want to raise here though, before addressing the rest of your comments:

  1. AiiDA is a domain-agnostic code, and so a lot of the shortcuts that you are providing in your scripts, which are QE specific, cannot be made by AiiDA. Instead those discussions would concern the plugin, aiida-quantumespresso. It is of course fine to discuss making the plugin easier to use, but I think this is an important point to make. AiiDA is not a framework to make it easy to work with QE (or anyother DFT code) but any code.
  2. AiiDA is not built to just make it easier to launch jobs on computing resources. It does do that, but another huge part of it is the provenance. This feature requires a lot more infrastructure and also imposes some restrictions on the interface and so at best we can make the interface as unrestricted as a framework that doesn't capture any provenance whatsoever. Again I am not saying that we shouldn't try to remove as much restrictions as possible, but comparing the two frameworks without keeping the added value of provenance in mind is not a fair comparison I find. The problem of provenance is complicated and as far as I am aware AiiDA is doing the most involved effort to try and address it, with all the complexity that comes with it. Other frameworks don't even bother.

During my show and tell this was the basic premise and not a lot of disagreement was put forth, moreover it seems that not many people are using AiiDA in their day to day work save for the core developers or active workflow developers. The main point I'm trying to make is that AiiDA doesn't do a lot to improve the tedious tasks that one has to do to run (DFT) calculations. Another point is that it's hard to understand what went wrong as it inevitably always does, e.g. make a mistake in the mpirun command in setting up a computer and you're lost.

Was your show and tell recorded by any chance (I unfortunately couldn't attend)? It would be good to see how your code makes it easy to introspect any errors.

Then on to your other points.

I would argue that not only are the lines of code in the OP much less, but also less complicated (the fact that you understood what was going on and translated it into AiiDA without first running it attests to that)

I think the number of lines are very similar. It may seem longer, but that is mostly due to the dictionary definition that I formatted to have each key on a separate line causing there to be a lot of white space. I could easily condense them on single lines (as is done in your OP) to reduce the number of lines. If you count the number of statements I think the length of both are quite comparable. As to the complexity I cannot really speak. I am obviously familiar with AiiDA and to me it seems easy to ascertain what is happening from reading the code. The fact I could make out what was happening in yours is probably just because I am both familiar with QE and have a lot of coding experience. Would be interesting to see if we ask non-experienced coders to read both samples and give an analysis of complexity.

heavy use of dicts that are not easily documented inline as classes leads to documentation either not being there or hard to find e.g. the options and metadata dicts are very opaque.

At best this is a problem of aiida-quantumespresso as AiiDA can not hardcode any code-specific parameters. Regarding the options, I see your point. The fact that they are defined in a dict is because of the fundamental design of how AiiDA processes are created. The inputs are passed in indirectly and are not set directly through a method, thus not having them as an argument. Don't think it is feasible that this will be changed. Of course, all options are fully documented in the documentation.

I don't really understand what a protocol is (if there is documentation I didn't find it)

This is a rather new concept (not to aiida-core mind you) that make the input generation for complex workchains automatic and therefore easier to use. With just a single line you can get the inputs for the entire workchain with all relevant parameters set to sensible default based on a desired level of efficiency and precision. Since it is not part of aiida-core there is no documentation for it there and we are working on updating the aiida-quantumespresso docs which will explain this.

I need to change the parameters for each calculation separately, which is pretty much the same as just manually changing the input files

I am not sure what you mean here? I only changed the inputs for the non-SOC calculation, but your script does exactly the same. Or is that not what you are referring to?

engine.run_get_node(builder) also something I didn't encounter yet while reading through documentation and tutorials/howtos/topics (may have just missed it)

Here is a detailed explanation in the documentation. The latest tutorial also shows how to launch a calculation in one of the first chapters.

If this is indeed the intended way to do things, it's quite clear that novice users don't get to this stage easily either because the documentation is missing, or tutorials do things differently.

I agree that the documentation for aiida-quantumespresso is far from what it should be and @mbercx is organizing to get this fixed soon. In the last few years we have mainly focused on adding the functionality and haven't had the time to do the documentation. As far as the documentation of AiiDA is concerned, we already have done a lot of work. Of course there are always things to be improved, and it seems that maybe the findability is the problem here (as most of the things you pointed out are actually documented but you didn't manage to find them), so this is something we should look into more closely. Although one of the inherent problems might of course be that there is a lot of functionality and therefore a lot of documentation.

Additionally one of the difficulties is that things work relatively fine if there's a workchain implemented and does what you want it to do, if not it suddenly requires an incredible increase in AiiDA knowledge to start playing around with adding things to workchains etc.

Here I fully agree. If you want to write a new workchain, it is not trivial and takes quite a bit of work. It would be great if we could simplify this somewhat, if only for early prototyping. That being said, the comparison that is being made is again not fully fair. The WorkChain is a heavy duty solution that gives you:

I might be mistaken but I think your script, or typical bash, ASE, what-have-you scripts for that matter, don't provide any of this. O think comparing the complexity directly but ignoring the additional features is not very useful. If you want a fair comparison, you can directly replicate the script without using a workchain. Remember, you don't need a workchain for quick prototyping or running multiple workflows in sequence. You can always write a simple Python script and sequentially launch whatever process you want. This gives you exact feature parity with other existing solutions and I don't really see how that is more complicated or worse in AiiDA. I think we have shown that with the translated script.

e.g. all that's needed to add and run an nscf and projwfc calculation to the "workflow" in the OP is

push!(job, gencalc_nscf(job["scf"], (6,6,6)))
push!(job, gencalc_projwfc(job["nscf"], -20, 10, 0.1))
# could not choose to run again scf and bands by
set_flow!(job, "" => false, "nscf" => true, "projwfc" => true)
submit(job)

That is only because you have implemented the logic when set_flow receives projwfc = True, correct? If you wouldn't have implemented this, a user of your code would have had to have done it themselves. This is the exact same situation that you described AiiDA is suffering off:

Additionally one of the difficulties is that things work relatively fine if there's a workchain implemented and does what you want it to do

Regarding aiida-pseudo (although if this becomes more in-depth we should maybe splice it off to a dedicated issues on that repo) and installing custom families: you can install any family using a single command:

aiida-pseudo install family -P pseudo.upf path/to/archive/or/directory SSSP/1.1/PBE/efficiency/SOC

As a final note, I want to say that I do appreciate you taking the time to make this post and analyse the userfriendliness and I hope my response does not come off as me being non-receptive to criticism. I think you make some good points about how we can improve documentation discoverability, the documentation of the aiida-quantumespresso plugin and to make prototyping of new workchains easier. But I do want to make it clear that we should be careful to make fair comparisons to make sure these discussion are useful and correctly identify where each problem should be directed: aiida-core, aiida-quantumespresso or any other plugin.

louisponet commented 3 years ago

which are QE specific,

Not the case, only on the level of file parsers and validation of flags is there QE specific code. Granted since I only use QE extensively many of the "niceties" that got added through the years is targeted at QE. But the general structure also works with ABINIT and ELK (rudimentary support, parsers etc is actually implemented, I just don't actively develop things that I don't use).

provenance

Yes this is one of the main benefits of AiiDA, but with some minor recent additions to my code I actually have working versioning that to some extent also includes the idea of provenance. I.e. all inputs and outputs and also the job flow is saved. I don't think this should at all lead to user facing changes or add difficulties.

Now of course, it is absolutely not as extensive as AiiDA, but I think much of the additional complexity of working with a fully featured provenance could be done behind the scenes, not explicitely by users. If this is not possible, then I'm afraid AiiDA just can't target users that don't just want to do the workchains that are provided, or simple day to day non-high throughput calculations.

Was your show and tell recorded by any chance (I unfortunately couldn't attend)? It would be good to see how your code makes it easy to introspect any errors.

The fact that all input and output files are stored into a single job folder, specified as job.local_dir in which any file is accessible through joinpath(job, "xyz") allows users to easily find and understand what happened. On the side of errors while preparing the jobs, I tried to add a lot of info warning and error messages to capture often made mistakes (of course, mostly the ones done by me :P).

It would be good to see how your code makes it easy to introspect any errors.

You can always just run the script and play around with it a bit :).

Of course, all options are fully documented in the documentation. Here is a detailed explanation in the documentation. The latest tutorial also shows how to launch a calculation in one of the first chapters.

Thank you! I missed the options documentation and I guess got a bit lost with the different options for submitting (I was basically always just using the submit one).

and it seems that maybe the findability is the problem here.

I fully agree, the documentation seems quite extensive, indeed I'm most often wondering where to look: Topics and How To vs Cookbook all have some required info, I wonder if it might not be merged somehow (I could be wrong). But also requiring to go find everything in documentation because it's often not clear of what to do is also not ideal for beginners, more expressive code can be useful here (which was another point I mentioned).

That is only because you have implemented the logic when set_flow receives projwfc = True, correct?

Not quite, gencalc_projwfc takes an existing DFInput and in this case generates the projwfc one based on that one. I.e. it will return a DFInput which then gets added to the ones inside the job. Each input in the job has a .run attribute that tells whether a calculation should actually be performed. set_flow! is just a method to through fuzzy matching set inputs with matching names to run or not. Adding random other codes to your job should just work given that there are i/o parsers etc for that code (same for AiiDA I guess).

fully automated provenance

Because I was interested to see if indeed this would be very hard to do rudimentary, I implemented something for my code yesterday. My way of doing things basically provides provenance through saving previous versions in the .versions directory of the main job.local_dir. So switching to previous versions is possible, and provenance is kept because the job script itself is saved too (just a slurm/bash script with non-running calcs commented out). I'm not trying to say at all that it's as powerful as aiida, obviously it's not, but on the basic level of running codes, understanding which ran with which inputs and for which structure, this basically does the same thing.

scalability

with asyncronous tasks in julia this is completely identical to submitting to a deamon (which could be a julia instance running behind the scenes), and how I do it:

t = @async begin
    submit(job)
    while slurm_isrunning(job)
          sleep(10)
          yield()
     end
      # process job results
end

This allows you to submit many jobs at the same time and steal the main julia thread to do some work if one finishes (@task could be used if you want it to use all the threads to achieve the same thing). Moreover, I find it hard to talk about real scalability (millions of jobs) when everything is implemented in python, but it is not of importance at this stage.

Robustness

This is true, I didn't implement in my code yet because I didn't really need it, but again I would argue it can be completely hidden from users and shouldn't impact user facing experience.

On the fair comparison point, I agree to some extent. As you see from the above things, it's not that incredibly difficult for me to get basic things similar to the strengths of aiida going in my code. While the niceties part of my code is obviously more developed for QE, the basic idea and underlying structure is code agnostic. The niceties part is something that should be handled in plugins, working with the underlying structure should be something for aiida-core.

I guess my main question is: are things too complicated for little added real value, real in the sense that it actually changes the user's life/are required for what people want to do?

Some examples of my wonderings (again, I am an ignorant so all these probably have very good reasons that I just don't know):

On these last points, maybe it makes more sense if I have a discussion with @sphuber or another core dev to not clutter this issue further.

sphuber commented 3 years ago

Thanks for the additional details and comments @louisponet .

On these last points, maybe it makes more sense if I have a discussion with @sphuber or another core dev to not clutter this issue further.

I'd happily schedule a meeting with you (and others that are interested) to discuss these things. But at the same time having it written down in an issue makes it useful for reference and posterity, so I think it is also fine to keep discussing here.

Before I respond to individual points with more detail, I wanted to give a general impression that I get from your suggestions. I appreciate that for certain things your particular code may be more straightforward to use and AiiDA may seem more cumbersome, but I think that in certain cases this is to be expected and justified and so the direct comparison is not fully fair but we are discussing a false equivalency. To try and use an analogy (exaggerating the example a bit for clarity): we seem to be discussing the differences in complexity and user friendliness between that of a bicycle and a space rocket.

Person A: "Why is the space rocket so complex and hard to use? I can use the bicycle just fine to ride to my work without any problems."
Person B: "Well sure, but does the bike allow you to go to space?" Person A: "Well no, but I don't need to go to space." Person B: "Maybe the space rocket is not the tool for your use case then."

This feeds nicely into what you seemed to conclude elsewhere:

I'm afraid AiiDA just can't target users that don't just want to do the workchains that are provided, or simple day to day non-high throughput calculations.

I don't necessarily agree with your conclusion as I think I have shown that running day-to-day non high-throughput calculations in a script is easy. Even so, if there are users that really do not care about provenance whatsoever or any of the other more advanced features that AiiDA provides, and want to simply run DFT jobs as quickly as possible (and I am sure there are those users), they may very well decide that AiiDA is not for them, and that is perfectly fine. There will never be a one-size-fits-all solution for any of this. Again, this is not to say that we should not try to simplify the interface where possible but in principle a space rocket will never be as easy to ride as a bicycle! In this sense, part of the job is also to make people realize the advantages of AiiDA over other solutions and the importance and usefulness! of provenance. Unfortunately, humans are typically bad in investing in the long term and instead go for short term gain, and provenance unfortunately does not fall into that category.

You seem to be dismissing my complaint about the false equivalency by saying that you could easily implement the features of AiiDA that your code currently doesn't support. For example, you claim that adding provenance, scalability and robustness would be easy to add. I would say this is easier said than done. You might very well launch multiple jobs in parallel with Julia, but what happens if the connection to the remote server fails? What happens when the machine running Julia needs to stop or gets accidentally killed, do you lose all the progress made in your workflows? I would be interested to see how you would easily solve these challenges.

Same thing with respect to provenance. You say you can simply store the script of a job together with its inputs and outputs in the run directory. But that is the easy part of provenance. The goal is not to store individual jobs, but to record the provenance of the data they produce and how they are reused in subsequent jobs. This requires a solution that is a lot more complex than just storing some files in a the run directory. Then of course, when that is stored, you need a way to query all that provenance. I'd be keen to hear your thoughts on how you can extend your code to do this in a simple manner.

Then to address your concrete questions and suggestions:

Computers and codes: not entirely clear to me why the setup is so extensive and complicated, and moreover: final, if I made a mistake in my mpirun command for a given computer, I need to manually delete the calcs I tried to run with it, all the codes that I set up with it, and finally the computer node, to then redo all the setup etc. This is insane, why not just provide the computer and exec as a simple input while submitting

The reason for having to predefine Computers and to a lesser extent Codes is because AiiDA needs to know how to connect to and interact with them. For example, you need to define which scheduler is running on the Computer so that AiiDA knows how to talk to it and submit jobs. Of course, if your package only supports SLURM (as I think DFControl does) you can hard code this in the code and life is a lot easier. It is also a lot less flexible. The same goes for connecting to the computer if it is over SSH, it needs to know how to connect and AiiDA provides a lot of flexibility through the transports. I do agree that the process of having misconfigured a Computer can be troublesome. Again for reasons of provenance we cannot allow to delete or modify a computer that has been used in calculations because it would invalidate the provenance (if your code doesn't care about provenance you once again do not have to deal with this complexity and of course your solution is going to be simpler). As somewhat of a help with this problem, we created verdi computer duplicate that will allow to easily make a clone and simply correct the wrong property. I am not sure what else we could do to simplify it (within the constraints of keeping provenance).

reentry scan, why does this not run automatically when loading aiida/when the warning is printed that maybe you forgot to run reentry scan

This is to allow third-party packages to easily and dynamically install plugins that extend AiiDA's functionality. Again this is not a problem for your code since it does not allow to dynamically extend its functionality at all. The use of reentry was a necessary evil to keep the command line responsive and make tab-completion usable and it has this downside you mention that we have not yet been able to fix. Running reentry scan automatically every time AiiDA is loaded would be an expensive overhead and would exacerbate the problem it was trying to fix. And there is an error message that says to run reentry scan but it can occur in many places and the reason cannot always be attributed to reentry with 100% certainty. The problem is not as easy as you might think. Anyway, a fix is on the way as the Python standard library has improved as of v3.8 that would make reentry obsolete, therewith solving this problem.

daemon restarts

This is simply a limitation of Python and most interpreted languages: a running instance cannot "hot-load" new code that has already been imported. If you find a way around this, please let us know. Of course you can always decide not to use the daemon and run everything in a script as in your example and then there is no problem whatsoever.

verdi, command line stuff: while I think it's marginally useful sometimes, it's quite weird that this seems like the main way of interacting with certain parts of aiida

I don't think this is necessarily the main way of interacting with AiiDA. It simply provides a CLI for the API that can be used from a shell or script. Can I ask why would you say that verdi is the main way of interacting with AiiDA? Is that an impression you get from the documentation or tutorials?

Nodes: why can't a Node just have a field of what it contains and the links it has, why does every class need a separate node class? This would allow users to just insert the data with AiiDA converting it to the provenance compliant forms behind the scenes.

Are you asking why there are different Data node types, e.g., Int, SinglefileData and StructureData? If that is the case, then there might be some misunderstanding here that we might need to explain better in the documentation. You are not forced to create new data types at all. If you want, you can simply take the base Data class and store any information in the database or its file repository you want. The data plugins simply provide additional functionality and flexibility to implement custom code on top of that. For example the StructureData implements additional useful methods that do not exist for Data. If you want, you can store your CIF file in a plain Data just fine. Your choice.

random example wondering: why does get_object_list not return me a list of objects but rather just a list of filenames and I need to do get_object with the filename?

I am not sure what methods you are referring to because I don't think they exist. Do you mean Node.list_object_names and Node.list_objects? The former returns a list of filenames and the second a list of file objects. Seems perfectly sensible to me.

I hope that this, if anything at all, clarifies at least a bit why certain things are the way they are. If after these explanations and descriptions of the conditions that we have to work with, you have concrete ideas how despite those restrictions we can still simplify things, that would be great. As a final conclusion and summary:

I guess my main question is: are things too complicated for little added real value, real in the sense that it actually changes the user's life/are required for what people want to do?

For some use cases that may very well be the case, and one should then by all means switch to ASE, DFControl.jl or any other tool that serves that use case more easily.

louisponet commented 3 years ago

Thanks again for your insights, it's very valuable for me to understand things better from a design/conceptual point of view.

To some degree I agree with your space rocket analogy, although I will rebuff it partly by saying that what I'm looking for is a space ship that can fly without Neil Armstrong having to open and close valves and pumps manually, and carefully control every component of his rocket. He is still a very very skilled pilot, but the rocket takes care of 90% of the functionality behind the scenes without him having to use mental capacity for that. BTW I'd bring it down to a car vs airplane because I hope AiiDA doesn't seek to only attract the 0.0000001% most skilled operators :).

But I think you do hit the nail on the head, it makes sense to understand what the target audience is for AiiDA, and where I realized that if it is people that want to do high-throughput and that's it, they probably are willing to spend more of their mental effort mastering AiiDA since it has clear benefits. If the goal is to "bring provenance to the masses" that is a different question, and brings us back to the main point of this discussion.

You seem to be dismissing my complaint about the false equivalency by saying that you could easily implement the features of AiiDA that your code currently doesn't support. For example, you claim that adding provenance, scalability and robustness would be easy to add.

No, that's not what I was trying to say, I was trying to highlight that with very little complexity (i.e. I did it in a night, I'm not claiming that I'm so amazing I rival AiiDA's capabilities with that little effort), rudimentary versioning, provenance and scalability is implemented that would be already useful to some people versus doing it manually after each DFT run.

You might very well launch multiple jobs in parallel with Julia, but what happens if the connection to the remote server fails? What happens when the machine running Julia needs to stop or gets accidentally killed, do you lose all the progress made in your workflows? I would be interested to see how you would easily solve these challenges.

Well since all inputs and outputs are saved in the directory and the job can be reloaded by DFJob("dir") there's no real difficulty to store inside the dir what the last running part was, and in a file that the daemon julia process loads to check which directories to find the previously running jobs in. Moreover, since the longest jobs will be running over slurm or another scheduler, those last steps are not even that often required. Similar to other points, supporting more schedulers than slurm is limited to the specific interface to it, the code is not very dependent on it, as it shouldn't.

My code is not meant to rival AiiDA (I developed it sporadically during my PhD which didn't even include that much DFT), and I'm not trying to say the AiiDA system is valueless at all. I think the idea of AiiDA is extremely valuable, I just wonder if a subset of that idea can not be used to capture and help more people (even if it's just to book-keep their millions of random scattered jobs and results, e.g. me). This is what I'm trying to understand if it is achievable.

I will make one more critical question, running the risk of losing my credibility: When writing a high-throughput paper with DFT, what more of the provenance do you actually need than the structure, input files, output files (possibly carrying the final data too) and job flow as specified by a log? Can you point me to actual concrete real world cases where the full power of AiiDA's provenance was of vital importance? I.e. The idea is amazing and I get that, but does it make sense to develop it when there are no actual real world usecases to test it with?

Back to the concrete points:

AiiDA needs to know how to connect to and interact with them.

Understandable, but can't that be done at submission time? How many different ways of transport are actually used in the field? Can't those simply be specified by some version of submit(xyz, computer = Computer("cscs")), is it really that limiting in real world example usecases? Storing a Code node with some relatively complicated setup beforehand, or just checking whether the code node already exists when submitting and add a link seems relatively identical from an implementation point of view but is quite different from a user's point of view. This is because at submission time you can actually verify and complain to the user that he's trying to use codes or computers that don't exist (i.e. verify the paths exists and little checks like that). In the current way you submit, wait, spam verdi process list -a in the terminal and hope that it all went through.

This is to allow third-party packages to easily and dynamically install plugins that extend AiiDA's functionality.

What do you mean exactly with dynamical, I will still have to restart the daemon and reentry scan. I don't fully grasp why plugins can't just extend AiiDA through subclassing and submit those classes to AiiDA, I presume this is a gap in my knowledge and I would like to understand better :) (might be something to save for our discussion).

Anyway, a fix is on the way as the Python standard library has improved as of v3.8 that would make reentry obsolete, therewith solving this problem.

That sounds fantastic.

Again this is not a problem for your code since it does not allow to dynamically extend its functionality at all.

There is absolutely no reason why a package couldn't come along, implement a new DFInput type/package/whatever and submit it using all the scaffolding that is in my code. Really unclear what you mean with this. I think it may make sense to at least play around the very minimum amount in order to understand what my code can't and cannot do.

This is simply a limitation of Python and most interpreted languages

Julia: Revise.jl Google searching for python leads me to (may or may not be usable, but I don't see an obvious reason why): Python: Reloadr (python) importlib.reload Observing filechanges in the packages known by the daemon combined with the latter might do the trick?

I don't think this is necessarily the main way of interacting with AiiDA. It simply provides a CLI for the API that can be used from a shell or script. Can I ask why would you say that verdi is the main way of interacting with AiiDA? Is that an impression you get from the documentation or tutorials?

Yea I mean pretty much everywhere it's mentioned to use verdi, I don't even really know how to get a similar output inside iPython as verdi process list -a (except of course using ! and the shell). I'm sure there is, but either I missed it again or at least it's not very prevalently mentioned. The simple command: aiida-pseudo install family -P pseudo.upf path/to/archive/or/directory SSSP/1.1/PBE/efficiency/SOC you mentioned is pretty much another demonstration of what I mean.

You are not forced to create new data types at all.

I mean I'm forced to convert my data types to AiiDA node versions before being allowed as inputs. If I was allowed to using normal python data during submission/creation and internally AiiDA wraps it simply in a Node it takes away another step of complexity.

The data plugins simply provide additional functionality and flexibility to implement custom code on top of that. For example the StructureData implements additional useful methods that do not exist for Data. If you want, you can store your CIF file in a plain Data just fine. Your choice.

Can you give some examples of additional functionality provided by for example Dict() Int() etc, which is not present if I'd be doing something like Node.data.xyz() where data would be a dict or int and xyz() whatever method I know those classes have? What does StructureData offer over using methods of Node.data with data being any ASE structure or something?

I am not sure what methods you are referring to because I don't think they exist. Do you mean Node.list_object_names and Node.list_objects? The former returns a list of filenames and the second a list of file objects. Seems perfectly sensible to me.

Sorry, I could have been more accurate here. I guess my question is why is it named objects if it seems they mostly return the files. If it doesn't always return the files, having a get_files method would be useful to allow users to extract additional info from the stored output files that might not have been parsed by plugins. This seems like a minor trivial thing but is a perfect example of things that are maybe not particularly intuitive from the user's point of view.

I hope that this, if anything at all, clarifies at least a bit why certain things are the way they are.

It is certainly helping me at least a lot.

If after these explanations and descriptions of the conditions that we have to work with, you have concrete ideas how despite those restrictions we can still simplify things, that would be great

I think it's perfectly possible and would be happy to brainstorm about this to possibly come up with some actual points to do. Also, while it may seem like I'm saying my code is the be all end all and think AiiDA should do things that way, this is entirely not true and not the point. It's merely to be used as an example of my thoughts and ways of working, which I feel are not limited to me. I might be wrong here.

sphuber commented 3 years ago

I will make one more critical question, running the risk of losing my credibility: When writing a high-throughput paper with DFT, what more of the provenance do you actually need than the structure, input files, output files (possibly carrying the final data too) and job flow as specified by a log? Can you point me to actual concrete real world cases where the full power of AiiDA's provenance was of vital importance? I.e. The idea is amazing and I get that, but does it make sense to develop it when there are no actual real world usecases to test it with?

I think we have touched on the crux of the issue here. You are not yet convinced of the usefulness let alone the necessity of provenance. If this is indeed what you are convinced off then I understand why you think AiiDA is overly complex. I personally don't agree though and I think provenance is very important and above all very useful because it allows me to query my data.

To give you some concrete examples. The 3DCD database that I (and recently @mbercx ) have been building consists of the optimized structure of tens of thousands of crystal structures. The overall workflow to get to the optimized structure is not even that complex: import CIF file, clean and parse CIF to get the structure, run an initial SCF to determine metallic and electronic character, run a vc-relax calculation and finally a final SCF to get the charge density of the optimized structure. Despite this "simple" workflow, the resulting provenance is already complex and I dare say it would be impossible to trace this without explicit linking. Not to mention the complexity of trying to find data. The possibility of querying your data in an efficient manner in AiiDA is of huge value. Just ask @mbercx who recently had to find some specific data in this huge database.

But it is not just important and useful for high-throughput. Take this recent paper that we wrote for example. It shows how, using AiiDA, we compute the equation of state of various crystal structures and the dissociation curve of a diatomic molecule. Still not a hugely complex task, but if you look at two examples of provenance graphs of single workflows (see Fig. 5, showing two examples of a relaxation workflow without any restarts, so these are the simplest form but not the typical ones) you can see how the actual provenance already quickly becomes complex and interlinked. Having used AiiDA, however, I could easily query the database to export all the results and make them available in an archive online. Not only that, reproducing the curated data that form the figures is a 100% reproducible by a simple script that I included in the archive.

Again, these are simple but real-life examples that, to my mind, demonstrate not only the importance but also the usefulness of storing complete provenance data (with data interlinking).

sphuber commented 2 years ago

Since this discussion I think there will be an AEP to provide a way to dynamically build workchains. Therefore I will close this issue for now