MDAnalysis / scipy_proceedings

SciPy conference proceedings: MDAnalysis paper
Other
0 stars 4 forks source link

need benchmarks for topology section #17

Closed orbeckst closed 8 years ago

orbeckst commented 8 years ago

In #9 the new topology system was introduced but we left placeholders BENCHMARK HERE in the text.

Please add figures asap.

kain88-de commented 8 years ago

would it be enough to copy the raw numbers and have a table?

orbeckst commented 8 years ago

On 29 May, 2016, at 04:28, kain88-de notifications@github.com wrote:

would it be enough to copy the raw numbers and have a table?

Probably the best way to represent it. And in any case, it’s a starting point. Anything is better than nothing!

Oliver Beckstein * orbeckst@gmx.net skype: orbeckst * orbeckst@gmail.com

kain88-de commented 8 years ago

@dotsdl how did you measure the RAM consumption in your benchmark?

richardjgowers commented 8 years ago

Just looking at the amount the python process was using after loading a Universe

On Sun, 29 May 2016, 13:26 kain88-de, notifications@github.com wrote:

@dotsdl https://github.com/dotsdl how did you measure the RAM consumption in your benchmark https://gist.github.com/dotsdl/0e0fbd409e3e102d0458?

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/scipy_proceedings/issues/17#issuecomment-222357932, or mute the thread https://github.com/notifications/unsubscribe/AI0jBwxS9Rbc2k4F9hFkLELniGit12gmks5qGYXZgaJpZM4IpQAJ .

richardjgowers commented 8 years ago

I think we also ran a manual garbage collect before too

On Sun, 29 May 2016, 13:27 Richard Gowers, richardjgowers@gmail.com wrote:

Just looking at the amount the python process was using after loading a Universe

On Sun, 29 May 2016, 13:26 kain88-de, notifications@github.com wrote:

@dotsdl https://github.com/dotsdl how did you measure the RAM consumption in your benchmark https://gist.github.com/dotsdl/0e0fbd409e3e102d0458?

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/scipy_proceedings/issues/17#issuecomment-222357932, or mute the thread https://github.com/notifications/unsubscribe/AI0jBwxS9Rbc2k4F9hFkLELniGit12gmks5qGYXZgaJpZM4IpQAJ .

kain88-de commented 8 years ago

Hm I don't have the topology files. So @dotsdl is currently the only one who can repeat these benchmarks.

richardjgowers commented 8 years ago

I think I've got them somewhere.

On Sun, 29 May 2016, 13:37 kain88-de, notifications@github.com wrote:

Hm I don't have the topology files. So @dotsdl https://github.com/dotsdl is currently the only one who can repeat these benchmarks.

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/scipy_proceedings/issues/17#issuecomment-222358417, or mute the thread https://github.com/notifications/unsubscribe/AI0jByXti9mWXyVAv86CV0hn7fzATIbTks5qGYh4gaJpZM4IpQAJ .

kain88-de commented 8 years ago

I'm updated the text now with tables. @richardjgowers you can just update the numbers when you have rerun the benchmark. I'll upload my branch shortly

dotsdl commented 8 years ago

@richardjgowers , @kain88-de I'm finally finished moving (why I've been silent last few days); I can get these easily. I'll make this my section to focus on.

orbeckst commented 8 years ago

Are the numbers in the tables not correct yet? One table contains the same numbers, the other one is commented out (see below).

Also, the "1.5M" atom system actually has 1.75 M (see table 1 in Kenney 2015). Sorry that these files are still not publicly available Becksteinlab/vesicle_library#1 – I could throw them on dropbox or our own server for the time being.

@dotsdl / @ianmkenney – please let me know by email where the files for the 1.75M, 3.5M and 10M atom test systems are on our lab NFS and I'll put them somewhere.

.. table:: Performance comparison of your new AtomGroup data structures compared with the old Atom classes. Times are given in seconds, the test systems are vesicles using repeats from the `vesicle library`_ :cite:`Kenney:2015aa`. :label:`tab:performance-accessing-gro`

      +----------+----------+----------+
      | # atoms  | Old IMPL | new IMPL |
      +==========+==========+==========+
      | 1.5 M    | 0.018    | 0.0005   |
      +----------+----------+----------+
      | 3.5 M    | 0.018    | 0.0005   |
      +----------+----------+----------+
      | 10  M    | 0.018    | 0.0005   |
      +----------+----------+----------+

..
   .. table:: Performance comparison of loading a topology file with 1.5 to 10 million atoms. Loading times are given in seconds, the test systems are vesicles using repeats from the `vesicle library`. :label:`tab:performance-loading-gro`

      +----------+----------------+----------+
      |          | Old IMPL       | new IMPL |
      +==========+================+==========+
      | 1.5 M    | 17             | 5        |
      +----------+----------------+----------+
      | 3.5 M    | 35             | 10       |
      +----------+----------------+----------+
      | 10  M    | 105            | 28       |
dotsdl commented 8 years ago

I'm seeing substantial slowdowns in the issue-363 branch compared to what we used to get for AtomGroup attribute access. Not sure what changed here; perhaps something in the merge with develop (@richardjgowers?). Investigating now.

We went from an 8x speedup across the board (for all system sizes) to a 6x speedup for 1.75M atoms and a 2x speedup for 10M atoms. Something is slowing down attribute access and it bogs down with system size, approaching the old system performance with increasing number of atoms. Is there a list comprehension hiding somewhere where there wasn't one before?

orbeckst commented 8 years ago

Can you get back the good performance if you go back before the merge?

orbeckst commented 8 years ago

Btw, the GRO files for the 1.75M, 3.5M and 10M particles systems are now on figshare under doi:10.6084/m9.figshare.3406708. The compressed tar file is ~150 MB in size.

dotsdl commented 8 years ago

Nah, it looks fine. Turns out under my tests I wasn't paying attention that swap was being consumed almost exclusively. When I run these benchmarks on my laptop, have to be very careful about what else is running. It doesn't look as good as the previous numbers, but they're within the fluctuations of what we saw before, I think. Not worried, and submitting a PR with fresh numbers.

orbeckst commented 8 years ago

<breathes sigh of relief/>

orbeckst commented 8 years ago

Maybe run on one of our bigger workstations? Or rather, if you provide me with the script I can try it – having initial values + text is great. You have plenty of other things to do (datreant paper, I am looking at you ;-) ).