GEMDAT-repos / GEMDAT

Python toolkit for molecular dynamics analysis
https://gemdat.readthedocs.io
Apache License 2.0
21 stars 3 forks source link

accelerate MSD calc. #285

Closed tfamprikis closed 3 months ago

tfamprikis commented 3 months ago

The current (new) MSD calculator (trajectory.mean_squared_displacement() based on FFT) is a bit too slow. Idk if it can be accelerated (?)

Most often we do not need all that resolution, we can simply calculate directly (without FT) enough points (100 or 1000) rather than every single timestep. Also this can be parallelized easily.

for example this is implemented in MDANSE: https://github.com/ISISNeutronMuon/MDANSE/blob/protos/MDANSE/Src/MDANSE/MolecularDynamics/Analysis.py mean_square_displacement()

perhaps trajectory.mean_squared_displacement() can take some optional arguments to control the way of calculation, but it should probably not default to the slow FT method.

ditto for the rotation autocorrelation (/MSAD)

also, should update MSD plot x-axis to plot versus time (ps) instead of timestep # . can use trajectory.time_step*1e12 to get timestep length in picosecond.

SCiarella commented 3 months ago

Thanks @tfamprikis for all the comments that you put in the last batch of issues!

About this one: they are actually using FFT in MDANSE when they do Sab = 2.0 * correlation(coords, axis=0, sumOverAxis=1). Furthermore, they are using the FFT algorithm from Kneller 1995, while we are currently using Kneller 2011 which is the most recent version from the same research group.

For the performance issue, I found that the bottleneck was actually the function _unwrap_pbc which I forgot to vectorize. Now the FFT MSD is ~100 times faster.

I am also implementing a way to select the algorithm for the MSD, as you can see in this notebook. According to my results, the FFT algorithm is significantly faster than the standard approach, so I don't see the reason for not using the FFT. Can you test this and let me know if you agree?

tfamprikis commented 3 months ago

thanks @SCiarella !

I can test eventually, but it will be a second, i'm mega busy atm. the tests you report in your notebook are quite convincing. Maybe you could just add the comparison between n_starts=-1 (FFT) and n_starts=len(trajectory) (multi-start) to show that the resulting MSDs converge despite the huge difference in speed. (currently there is still a curious mismatch between n_starts=-1 and n_starts=2000)

SCiarella commented 3 months ago

No problem! I just wanted to convince you to leave the FFT method as default, now that it has been fixed :smile:

I have updated the notebook including n_starts=len(trajectory)=5000 to show that the MSD converges towards the FFT calculation.

tfamprikis commented 3 months ago

I am convinced... you are a wizard!