Modification of structure to work for Lammps

JPchico commented 2 years ago

Lammps seems to require that the structure be written in a specific way as to function. Right now this is taken care in the code by the generate_lammps_structure function. Which means that the structure that is actually passed to the run in not exactly the same than the one in the input, it is transformed.

While the transformation is such that the structure is the same by symmetry this can be confusing to the user as the input and output structure would look quite different.

The question is then, is this transformation always needed? Should one transform the output structure so that it matches what one expected from the input (by using the inverse transformation)?

This is tricky since this happens at the Calculation level and hence we cannot call a calcfunction to keep track of what is happening to the structure itself.

JPchico commented 1 year ago

This can be solved by crating a calcfunction that is called at the workflow level, The callcfunction will be available for the users that want to use it to keep the provenance at the calculation level, but it is called still in the calculation but as a normal function, thus avoiding possible "non-lammps" friendly structures from being run. If they are already transformed nothing should happen.

JPchico commented 1 year ago

This is being handled in the workchain, where the generation of the structure is a calcfunction. The only thing this calcfunction does is to generate the LAMPPS compatible structure format.

Of this way the provenance on how the structure is modified is kept.

The modification is run once more internally as a function (as it is a part of the calculation), however, this should do nothing, as the transformation matrix in this case should be the identity matrix.

JPchico commented 1 year ago

I was looking at this again, and I have added the calcfunction as it it discussed here. However, I do wonder if it wise to have it always on, i.e. at the calculation level.

We are going to be adding a workchain that is what the users are meant to use anyhow, and in there adding the calcfunction makes sense since we can keep track of the provenance and so on.

However, if someone decides to use the low-level calculation maybe it is fair to assume that they should keep track on whether or not one is running with a LAMMPS friendly structure.

What do you think @chrisjsewell , @sphuber? Any strong opinions on the matter?

sphuber commented 1 year ago

If I understand correctly, the generate_lammps_structure is merely writing the representation of the structure as defined by the StructureData input, into a different representation, one that LAMMPS understands. Is it actually functionally changing the structure, or is it merely a different representation? If it is the latter, I think we don't need a calcfunction at all, and having the CalcJob plugin do this internally is perfectly line. That is exactly what the calcjob plugin is supposed to be doing: taking input represented by Data nodes and writing it in a representation that the target code understands.

While the transformation is such that the structure is the same by symmetry this can be confusing to the user as the input and output structure would look quite different.

Is there always an output structure? Even for calculations where the structure is not modified, such as a single point calculation of total energy? In that case, I guess it would be a bit weird to see a different output structure compared to the input; but I would say that an output structure is not even needed in this case. In any other calculations that are meant to change the structure, such as in optimizations or dynamics, it makes perfect sense to me that the output structure will be different.

JPchico commented 1 year ago

As far as I understand it should just be a different representation, that is correct. In that case the structure is not changing itself, it is just being represented in a different way.

Is there always an output structure? Even for calculations where the structure is not modified, such as a single point calculation of total energy? In that case, I guess it would be a bit weird to see a different output structure compared to the input; but I would say that an output structure is not even needed in this case. In any other calculations that are meant to change the structure, such as in optimizations or dynamics, it makes perfect sense to me that the output structure will be different.

I think that by default we now have always an output structure, I had not thought about the single point energy case, that is a good point.

I agree with your point that the structure before and after the calculation should be different (unless nothing happened of course), my concern was mostly about the user wondering about why suddenly the cell could be in a different representation. But perhaps it is just simpler to instead say this in the documentation, as the structure is not really different, just written in a different way.

sphuber commented 1 year ago

But perhaps it is just simpler to instead say this in the documentation, as the structure is not really different, just written in a different way.

I think this is fair. It prevents additional complexity and always having a bunch of extra nodes in the provenance graph that are perhaps "just" bloat.

JPchico commented 1 year ago

I agree, so unless there are any objections I will just add a section in the documentation about this and leave as it is (I made some small changes to not rely in the ase structure if not just use the base node instead)

JPchico commented 1 year ago

I noticed that I already had included a note about this in the documentation.

LAMMPS requires the simulation cell to be in the format of a lower triangular matrix (right-handed basis). Therefore the cell and positions may require rotation and inversion. This is automatically done to every structure at the calculation level, so it might be that the cell that is provided is modified so that it follows this convention.

So unless there are more reasons to be explicit with this, I think that I'll push the PR with the small changes that I made so that one does not need to depend on the ASE abstraction of the structure and then we can consider this closed. Any objections @sphuber and @chrisjsewell ?

JPchico commented 1 year ago

Something that I noticed in the structure of the code when checking this is that the common folder is actually a set of parser utilities (mostly related to the input generation). I think that for clarity I'll move these functions to the parser folder, as that is where they are used and I think that is where they should be.

JPchico commented 1 year ago

Closed via #70

aiidaplugins / aiida-lammps

Modification of structure to work for Lammps #41