add details for RDF calculation

orbeckst commented 5 years ago

Add details to Methods

How many water molecules are in the simulation?
Did we use capped_distances?
- If yes, what was the cut-off?
How many bins?

Can we comment on memory requirements?

VOD555 commented 5 years ago

All RDF calculatoin is based on capped_distances, and we used a cut-off of 5 angstroms. The bin number is the default number 75.

orbeckst commented 5 years ago

What are the memory consumptions for 5 Å and 15 Å cutoff?

Use memory_profiler, see eg https://github.com/MDAnalysis/mdanalysis/wiki/memory-profiling

Run for a few frames and just single-threaded. Something like

import pmda.rdf

@profile
def run_rdf(top, traj, selection):
   u = mda.Universe(top, traj)
   rdf = pmda.rdf.InterRDF(u, sel).run(n_jobs=1)
   return rdf

if __name__ == "__main__":
    rdf = run_rdf(top, traj)
    print(rdf)

Run with

python -m memory_profiler bench.py

and look at output on stdout.

orbeckst commented 5 years ago

@VOD555 please add the information to the issue. I can integrate it into the paper draft.

VOD555 commented 5 years ago

Some data for RDF calculation. When using step=50, the RDF calculation with 15-angstrom cutooff can be completed. However, when using step=1, there would be a error message Segmentation fault (core dumped). I'm not very sure why this happened.

Line #    Mem usage    Increment   Line Contents
================================================
     4  101.410 MiB  101.410 MiB   @profile
     5                             def run_rdf(top, traj):
     6  167.445 MiB   66.035 MiB       u = mda.Universe(top, traj)
     7  167.445 MiB    0.000 MiB       g1 = u.select_atoms('name OH2')
     8  167.445 MiB    0.000 MiB       g2 = u.select_atoms('name OH2')
     9  196.348 MiB   28.902 MiB       rdf = pmda.rdf.InterRDF(g1, g2, range=(0,5) ).run(n_jobs=1, n_blocks=1)
    10  196.348 MiB    0.000 MiB       return rdf

Line #    Mem usage    Increment   Line Contents
================================================
     4  101.840 MiB  101.840 MiB   @profile
     5                             def run_rdf(top, traj):
     6  167.691 MiB   65.852 MiB       u = mda.Universe(top, traj)
     7  167.691 MiB    0.000 MiB       g1 = u.select_atoms('name OH2')
     8  167.691 MiB    0.000 MiB       g2 = u.select_atoms('name OH2')
     9  228.027 MiB   60.336 MiB       rdf = pmda.rdf.InterRDF(g1, g2, range=(0,15)).run(n_jobs=1, n_blocks=1, step=100)
    10  228.027 MiB    0.000 MiB       return rdf

Line #    Mem usage    Increment   Line Contents
================================================
     4  101.906 MiB  101.906 MiB   @profile
     5                             def run_rdf(top, traj):
     6  167.902 MiB   65.996 MiB       u = mda.Universe(top, traj)
     7  167.902 MiB    0.000 MiB       g1 = u.select_atoms('name OH2')
     8  167.902 MiB    0.000 MiB       g2 = u.select_atoms('name OH2')
     9  227.875 MiB   59.973 MiB       rdf = pmda.rdf.InterRDF(g1, g2, range=(0,15)).run(n_jobs=1, n_blocks=1, step=50)
    10  227.875 MiB    0.000 MiB       return rdf

orbeckst commented 5 years ago

I am not really sure why the seg fault. I also don't quite understand why the 15 Å calculations do not fit into the memory of one node: 24 x 230 MB = 5520 MB = ~5.5 GB and each node has 128 GB of RAM. We don't have much space in the draft and I am not sure what to write about the RAM consumption so I'll just list the basics. However, we should look into the memory issues.

orbeckst commented 5 years ago

>>> import MDAnalysis as mda
>>> u = mda.Universe("YiiP_system.pdb")
>>> u.select_atoms("name OH2")
<AtomGroup with 24239 atoms>

Becksteinlab / scipy_proceedings

add details for RDF calculation #30