Add possibility to run any LAMMPS script

sphuber commented 1 year ago

The current CalcJob plugins are quite opinionated in that they all require the structure, potentials and input parameters to be defined through a StructureData, EmpiricalPotential and Dict node. Although this makes sense when comparing to other plugins in the materials science domain, it is kind of restricting. For example, if I wanted to run this benchmark script (which comes from the official LAMMPS benchmark suite), I am not sure that can be done. The "structure" is a 32'000 atom LJ-fluid. In this case, it doesn't really make sense to force a user to convert that to a StructureData since the script instructs LAMMPS to generate the fluid on the fly in the actual calculation.

I wonder if it would make sense to have a really "base" calculation plugin that simply takes a SinglefileData that contains the script to run. I see two options:

Create a new dedicated CalcJob implementation
Update the BaseLammpsCalculation to add the optional input_file input. When specified, the structure, potential and parameters inputs are ignored and no longer required.

I'd be fine with either option, but I think the first might be clearest for the user. I would propose simply naming it LammpsCalculation and giving it the entry point lammps. The other plugins would then be more specific as the already are currently.

One advantage of reusing BaseLammpsCalculation is that you could write a single BaseWorkChain for it, instead of having to write two. But then again, there are already a bunch of plugins for particular tasks, so maybe that ship has sailed anyway.

JPchico commented 1 year ago

Hi @sphuber

I think that the best would be to go for option 2, as the other CalcJobs are quite restrictive. We wanted to have the possibility of running any single LAMMPS stage, i.e. anything that has to do with loops and so on would have to be dealt with at the workflow level.

The option of setting the SinglefileData as an input was something that we discussed. And it could be added, the only thing I wonder is about querying and finding information, since if the only input is the input_file as you suggest, I wonder how one is able to find information by querying.

I think the plan is to replace the force, md and relax CalcJobs with WorkChains based on a BaseWorkChain as you point out (this is actually being written).

Any thoughts @chrisjsewell ?

sphuber commented 1 year ago

The option of setting the SinglefileData as an input was something that we discussed. And it could be added, the only thing I wonder is about querying and finding information, since if the only input is the input_file as you suggest, I wonder how one is able to find information by querying.

Indeed, this is the tradeoff, you can submit any LAMMPS script you want, but you lose the queryability provided by the more explicit and separate input types. I think there are certain use cases where this makes sense though and one would happily take that trade. The example of wanting to run the benchmark scripts is a good one I think. I already have this implementation lying around, since I used it for an internal demo, so would be happy to open a PR for it.

I think the plan is to replace the force, md and relax CalcJobs with WorkChains based on a BaseWorkChain as you point out (this is actually being written).

Any thoughts @chrisjsewell ?

I am not too familiar with LAMMPS yet, but this sounds like it could make a lot of sense.

JPchico commented 1 year ago

Feel free to open the PR for the change, at least I think it makes sense.

I'm also not super familiar with LAMMPS, but my colleagues that use it seem to think that this is a good idea :)

One thing about the input_file, is that in theory this would only work with potentials which are written into the file itself right? Unless one adds the potential in the inputs also, and make sure that the input_file has the correct file name for the potential.

Maybe this is something that one could check somehow?

JPchico commented 1 year ago

Now that I think a bit more about this, I have one question, and it is about the parsing. I think that the current parser is clever enough to read the logfile and the trajectory file without knowing what was asked for in the input. But we need to check that this will work also if one pass just any file in the input_file.

Also I made an issue on the restart file here #58 which might be of importance for this issue also.

sphuber commented 1 year ago

Now that I think a bit more about this, I have one question, and it is about the parsing. I think that the current parser is clever enough to read the logfile and the trajectory file without knowing what was asked for in the input. But we need to check that this will work also if one pass just any file in the input_file.

Indeed, we will have to make sure the parser doesn't assume that the structure, potentials and parameters input are always going to be there.

JPchico commented 1 year ago

Closed via #59

aiidaplugins / aiida-lammps

Add possibility to run any LAMMPS script #55