ChiCheng45 commented 4 months ago

Description of work Updated the pair distribution function job so that it creates a supercell for large r values. The previous version of the pair distribution function used PBC however it only added the closest distance for a given pair of atoms. I've left the default of this setting to false since it can make the calculation expensive for large values of r. With this update, you now have the option to let MDANSE generate a supercell so that it adds the distances between a given pair of atoms including its periodic images out to the cutoff radius.

Here are some comparisons. I used a trajectory that has a cubic unit cell of ~1.8 nm. The sharp peaks are due to the atoms and the periodic images of itself. In the example below we have a peak at ~1.8 and ~sqrt(2) * 1.8.

PDF Without supercell

With supercell

RDF Without supercell

With supercell

TCF Without supercell

With supercell

Fixes Fixes #431

To test You will need to rebuild MDANSE when testing since some cython code was updated.

Try running the pair distribution function job with and without the supercell setting. Both should finish successfully without issues. Without supercell, results should be the same as the previous version.

With supercell, results should be the same except for longer ranges. Check that the correct long-range behaviors are seen e.g. PDF goes to 1 at long distances.

NOTE: with very large r values (r > 3) it will take quite some time since there will be a lot of distance calculations.

gonzalezma commented 3 months ago

I do not think that this is a very good idea. The real information is contained inside the simulation box, and creating a supercell to be able to calculate the pdf beyond R/2 is going to give a nicer figure (except for the peaks of the atom images), but no more information and even bad ideas if any user tries to use that range. Normally the user should know that if he has a cubic box of size 2 nm, he should use a maximum R=1 nm when calculating the pdf. So in case the drop at larger R bothers you, I would propose as an alternative solution to fill the Rmax box with the value of the radius of the largest sphere that can be accommodated inside the simulation box and show a warning if one tries to use a larger value.

MBartkowiakSTFC commented 3 months ago

It looks like we are going back to the ongoing discussion of where we draw the line between our responsibility and the user's responsibility. So far the philosophy of MDANSE has been that we do not tell the users how to run their simulations, but instead we just tell them what physical observables their trajectory corresponds to, leaving to them the responsibility for running the right simulation for the kind of information they want to get.

From my point of view, having the possibility of calculating the curve that actually matches the theoretical prediction is a good way of increasing the user's confidence in the validity of the results. And as for the fact that it does not contain any new information, you are right. But at the same time, the result calculated using the supercell is a representation of what the trajectory actually contains if periodic boundary conditions are used in the simulation.

I agree with you that the user should be aware of the limitations of the simulation technique they have chosen, and should realise that some information can only be obtained by increasing the size of the simulation box. But since, ultimately, they need to know what they are doing, I don't see a reason not to allow them to calculate the complete PDF curve of their trajectory, even if only the initial part is useful.

Overall, it seems that this could make an interesting case for a tutorial! Analyse several periodic simulations with different box sizes and see how the PDF changes.

gonzalezma commented 3 months ago

I am not sure that I understand you. As I see it, beyond L/2 you will be showing artificial correlations due to the PBC. What's the point in calculating something that can increase considerably the computation time and be misleading? How is this going to increase the user confidence in the validity of the results? I may see a potential interest when simulating a crystal and trying to calculate S(Q) showing narrow Bragg peaks, but for disordered systems I think that it is a little bit dangerous. So if you go ahead with this, I think that it would be very important to add a clear warning about the potential issues associated with using the supercell option.

MBartkowiakSTFC commented 3 months ago

OK, let us do more testing and try different use cases. We will get back to this discussion once we have checked if there are scenarios where this option would be useful. Unless it turns out that there are some cases for which there is an obvious benefit of having this option enabled, we can follow your advice and remove it.

ChiCheng45 commented 3 months ago

I had a think and I think @gonzalezma is correct. With the supercell, the results appear correct beyond half the box length but there will be artifacts in this region even if you ignore the peak. In the original method, the results beyond half the box length are clearly nonsensical.

@MBartkowiakSTFC let's just close this. We could update MDANSE to restrict the maximum length of this job. I think it will probably need to be something like half the shortest cell length and other restrictions when using NPT etc.

MBartkowiakSTFC commented 3 months ago

@ChiCheng45 OK, go ahead. @gonzalezma Thank you for your input on this pull request.

ChiCheng45 commented 3 months ago

462 Created to set the maximum r value.

ISISNeutronMuon / MDANSE

Pair distribution function with supercell setting #453

462 Created to set the maximum r value.