boscoh / pdbremix

analyse PDB files, run molecular-dynamics & analyse trajectories
MIT License
61 stars 25 forks source link

Andrey tokarev ASA optimization #4

Closed Andrey-Tokarev closed 9 years ago

Andrey-Tokarev commented 9 years ago

Hi Bosco,

I finally managed to finish and submit my article :) . As I wrote in the comments (http://boscoh.com/protein/calculating-the-solvent-accessible-surface-area-asa.html) in that work I used your code for calculating ASA (which I referenced, as well as Shake-Rupley paper). In the process I slightly optimized it: function "calculate_asa_opt" in asa.py is linear in number of atoms by "pre-partition the atoms into disjoint cubes to speed up the search for neighboring atoms" as you suggested in the post. Specifically, it works in O(n * m + n * N * m) time complexity in comparison to original version "calculate_asa" that works in O(n * n + n * N * m), where n - number of atoms = len(atoms), m - averaged number of neighbors of an atom = mean[len(neighbor_indices)], N - number of generated sphere points = n_sphere_point.

Another optimization is in using square of distance instead of distance in two places: "calculate_asa_opt" and "find_neighbor_indices_mod": it allows to avoid a heavy square root operation (actually, this simple change saved up to 20% of time!). For this purpose new function "mag2" (square of vector magnitude) was added to two files: v3numpy.py and v3array.py.

I tested "calculate_asa_opt" for "1be9.pdb" protein structure for all atoms (1045 atoms). Overall speed up: for N = 960 sphere points, it was ~1.8 times faster; for N = 96 sphere points, it was ~2.7 times faster. As it is seen from the above formula, the greater is n (and less N), the greater is speed up.

As you see, I did not delete or modify any of your code. Everything new was added. So if can either substitute your functions with mine, or leave it as it is: two functions, quadratic and linear.

Kind regards, Andrey Tokarev

P.S. I am a complete novice to GitHub. This is actually my first action here. So I am sorry for any possible missteps

boscoh commented 9 years ago

Hi Andrey,

Just tested your changes, and they're fantastic. Example with the structure 1FNF.

I added a flag -o to turn optimization on/off:

> time pypy `which pdbasa` -o 1fnf.pdb 
real    0m40.573s

And without:

> time pypy `which pdbasa` 1fnf.pdb 
real    1m4.901s

A solid reduction by 40%. Great!

Before I merge the commit, could you harmonize your coding style with mine?

Namely:

Once this is done, I'll merge it and then add the -o flag in pdbasa to use your fantastic optimization.

Great work!

Bosco

Andrey-Tokarev commented 9 years ago

Hi Bosco,

Thanks for a positive reply! I will do your suggestions with pleasure.

Andrey

P.S. Just a note to your test: as you know, the larger file (more atoms) you take for the ASA calculation, the more reduction you have. (I am from physics/chemistry fields, so I have no idea how big proteins can be. If they are limited in length, then, of course, optimization is limited by that fact.)

boscoh commented 9 years ago

Looking forward to it. Bosco.

On Wed, Feb 18, 2015 at 10:30 PM, Andrey-Tokarev notifications@github.com wrote:

Hi Bosco,

Thanks for a positive reply! I will do your suggestions with pleasure.

Andrey

P.S. Just a note to your test: as you know, the larger file (more atoms) you take for the ASA calculation, the more reduction you have. (I am from physics/chemistry fields, so I have no idea how big proteins can be. If they are limited in length, then, of course, optimization is limited by that fact.)

— Reply to this email directly or view it on GitHub https://github.com/boscoh/pdbremix/pull/4#issuecomment-74849941.

http://boscoh.com

boscoh commented 9 years ago

Also, an average protein is around 300 amino acids

Andrey-Tokarev commented 9 years ago

Bosco, thank you for your answers! I made corrections in the new pull request.

boscoh commented 9 years ago

Andrey, I can't see the new Pull Request yet. Have you put it up yet?

Andrey-Tokarev commented 9 years ago

Sorry about that, I somehow created pull request to my own master fork instead of yours. Looks like I fixed the bug.