Closed GoogleCodeExporter closed 9 years ago
Now that's a cool ticket!
We're you thinking of following that particular implementation (ie c-based,
exploiting the hardware details)? Anything i could write up in python using the
multiprocessing module would probably give a boost but it would be nowhere near
the low level described in the paper.
(I have to confess i follow the decomposition procedure only up to the stage of
the minor diagonal ;))
Original comment by jan...@gmail.com
on 9 Oct 2011 at 11:11
[deleted comment]
I haven't really thought very hard about the implementation apart from the fact
that it should be doable and that there's already optimised code out there to
do something similar. If one can take the code by Wirawan et al, customize it,
and add it in then that might be sufficient.
> And besides: is the main difference between distance_array() and
self_distance_array() in the shape of the returned array?
self_distance_array() only computes and stores N*(N-1)/2 distances instead of
N*M. See
http://mdanalysis.googlecode.com/svn/trunk/doc/html/documentation_pages/analysis
/distances.html#MDAnalysis.analysis.distances.self_distance_array
Original comment by orbeckst
on 10 Oct 2011 at 2:12
Original comment by orbeckst
on 15 Nov 2011 at 2:13
Original comment by orbeckst
on 15 Nov 2011 at 2:13
Does that mean the ticket is up for grabs?
Original comment by jan...@gmail.com
on 15 Nov 2011 at 2:19
Yes, I realised that the system automatically assigns tickets when a developer
files one, and that is not always an accurate relflection on who's actually
taking care of things :-). If you came up with something faster for
distance_array() then that would be much appreciated.
Original comment by orbeckst
on 15 Nov 2011 at 5:36
What are the sizes of coordinates were aiming for, in most cases?
Original comment by jan...@gmail.com
on 16 Nov 2011 at 2:35
I've coded up a possible solution, but for systems between 1e2 and 1e4
coordinates it is definitely slower than what we have now (serial C). This
could be either because the mulitprocessing is slow or because i've messed up
(guess, which one is more likely)
My approach to parallelization is as simple as it gets, for two coordinate
arrays
coordA (100,3)
coordB (100,3)
I take the number of cores available, and slice the coordA into pieces, such
that coordA gets divided almost equally between cores; I took care to share
memory occupied by the slices, not copy. Then a scipy.weave (c-like,
precompiled) method is called to calculate the distances between a slice of
coordA and all of coordB.
The code and example performance check attached.
Maybe dividing the calculation in some other way (eg equal size squares) could
help?
Original comment by jan...@gmail.com
on 16 Nov 2011 at 3:18
Attachments:
Maybe you can profile the code (import cProfile...) and see where it spends
time. Perhaps there's substantial setup/communication involved with
multiprocessing and that kills it. Then I'd look at writing it in C using
threads.
Original comment by orbeckst
on 16 Nov 2011 at 2:00
(Jan, I set status to 'Started' and Jan as issue owner, feel free to change
again to None if you get bored of it.)
Original comment by orbeckst
on 16 Nov 2011 at 2:02
I'm afraid i won't have time to look into that any more, sorry
Original comment by jan...@gmail.com
on 17 Dec 2011 at 2:10
(up for grabs, glory awaits)
Original comment by orbeckst
on 6 Feb 2012 at 5:09
If anybody is still interested/looking for a starting point, cython.parallel
module looks pretty simple and easy
http://docs.cython.org/src/userguide/parallelism.html
Original comment by jan...@gmail.com
on 13 Jul 2012 at 3:46
Putatively resolved in
http://code.google.com/p/mdanalysis/source/detail?r=390b6e838ebf87967bc8f2a78574
e73c39c24e0f&name=develop
Benchmarking timeit script attached. Parallel cython implementation is fastest,
serial cython implementation is slowest; parallel cython implementation gets
faster as the matrix dimensions increase (Explained below) .
A. For coord((1000,3), float32), ~2x improvement
Performance for serial pure C distance_array
1.55131959915
Performance for cython distance_array (serial)
2.40466713905
Performance for cython distance_array (parallel)
0.63991060257
B. For coord((10000,3), float32), ~4x improvement
Performance for serial pure C distance_array
11.014444828
Performance for cython distance_array (serial)
13.3849320412
Performance for cython distance_array (parallel)
2.80142903328
The performance is improved as the matrix dimensions increase.
Original comment by jan...@gmail.com
on 15 Jul 2012 at 3:30
Attachments:
Hi Jan – looking good.
I moved it elsewhere (MDAnalysis.core.parallel.distances) instead of
MDAnalysis.analysis.cython.distances. Other accelerated/parallel algorithms
could be put under parallel as well. In the future we could use core.flags to
switch between serial and parallel versions. I also made a separate UnitTest
file test_parallel.py (which passes – I really appreciated your nicely
written test!)
We don't need a separate "cython" directory (after all, that's just one way of
accelerating code). I also moved the pxd file to src/numtools (all non-py code
lives under src/) and integrated it fully with the cython logic in setup.py (so
that the C-code is used when cython is not available).
Your parallel.distances.distance_array() does not yet support the same
arguments as the serial version (box for PBC and array to be filled) – if you
add this then we could offer the parallel version as a drop-in replacement for
the serial core.distances.distance_array() one.
Good stuff!
Thanks,
Oliver
Original comment by orbeckst
on 17 Jul 2012 at 11:11
Hi Oli,
Actually i started with the package structure you've proposed but then I
thought this may make it look too commited ;p
Thanks for all the kind words - it's a pleasure. Not often does one find a
group putting emphasis on testing.
The flag idea sounds great; additionally, I'll conform to the serial
distance_array() 'signature'.
Thanks!
Jan
Original comment by jan...@gmail.com
on 3 Aug 2012 at 3:17
Jan,
once you're code supports the same call signature as the serial code and passes
the tests, please close this ticket. Open another one for implementing a flag
that switches between serial and parallel code. This sounds like an interesting
direction for MDAnalysis 0.8.
Thanks,
Oli
Original comment by orbeckst
on 3 Aug 2012 at 6:46
Oli, quick question: calc_distances is using doubles for positions and floats
for box/half_box. Is this a convention i should adhere to, or not?
Original comment by jan...@gmail.com
on 4 Aug 2012 at 4:09
> calc_distances is using doubles for positions and floats for box/half_box. Is
this a convention i should adhere to, or not?
I can't remember if there was a good reason. Keep it the same for the time
being, at least then we know that the interface is exactly the same. If we
decide to change it then we change it everywhere at once.
Original comment by orbeckst
on 4 Aug 2012 at 5:21
Cancel that: core.distances.distance_array takes float32 coords and box -
otherwise throws an error. i must have looked at an unused/wrong bit of code.
Anyhow, my code now also expects float32 coordinates and box.
Added the tests and corrected the signatures, can close now.
Original comment by jan...@gmail.com
on 7 Aug 2012 at 1:43
Original comment by orbeckst
on 8 Aug 2012 at 5:11
Original issue reported on code.google.com by
orbeckst
on 1 Oct 2011 at 9:15