Closed Andrey-Tokarev closed 9 years ago
Hi Andrey,
Just tested your changes, and they're fantastic. Example with the structure 1FNF.
I added a flag -o
to turn optimization on/off:
> time pypy `which pdbasa` -o 1fnf.pdb
real 0m40.573s
And without:
> time pypy `which pdbasa` 1fnf.pdb
real 1m4.901s
A solid reduction by 40%. Great!
Before I merge the commit, could you harmonize your coding style with mine?
Namely:
Once this is done, I'll merge it and then add the -o
flag in pdbasa to use your fantastic optimization.
Great work!
Bosco
Hi Bosco,
Thanks for a positive reply! I will do your suggestions with pleasure.
Andrey
P.S. Just a note to your test: as you know, the larger file (more atoms) you take for the ASA calculation, the more reduction you have. (I am from physics/chemistry fields, so I have no idea how big proteins can be. If they are limited in length, then, of course, optimization is limited by that fact.)
Looking forward to it. Bosco.
On Wed, Feb 18, 2015 at 10:30 PM, Andrey-Tokarev notifications@github.com wrote:
Hi Bosco,
Thanks for a positive reply! I will do your suggestions with pleasure.
Andrey
P.S. Just a note to your test: as you know, the larger file (more atoms) you take for the ASA calculation, the more reduction you have. (I am from physics/chemistry fields, so I have no idea how big proteins can be. If they are limited in length, then, of course, optimization is limited by that fact.)
— Reply to this email directly or view it on GitHub https://github.com/boscoh/pdbremix/pull/4#issuecomment-74849941.
Also, an average protein is around 300 amino acids
Bosco, thank you for your answers! I made corrections in the new pull request.
Andrey, I can't see the new Pull Request yet. Have you put it up yet?
Sorry about that, I somehow created pull request to my own master fork instead of yours. Looks like I fixed the bug.
Hi Bosco,
I finally managed to finish and submit my article :) . As I wrote in the comments (http://boscoh.com/protein/calculating-the-solvent-accessible-surface-area-asa.html) in that work I used your code for calculating ASA (which I referenced, as well as Shake-Rupley paper). In the process I slightly optimized it: function "calculate_asa_opt" in asa.py is linear in number of atoms by "pre-partition the atoms into disjoint cubes to speed up the search for neighboring atoms" as you suggested in the post. Specifically, it works in O(n * m + n * N * m) time complexity in comparison to original version "calculate_asa" that works in O(n * n + n * N * m), where n - number of atoms = len(atoms), m - averaged number of neighbors of an atom = mean[len(neighbor_indices)], N - number of generated sphere points = n_sphere_point.
Another optimization is in using square of distance instead of distance in two places: "calculate_asa_opt" and "find_neighbor_indices_mod": it allows to avoid a heavy square root operation (actually, this simple change saved up to 20% of time!). For this purpose new function "mag2" (square of vector magnitude) was added to two files: v3numpy.py and v3array.py.
I tested "calculate_asa_opt" for "1be9.pdb" protein structure for all atoms (1045 atoms). Overall speed up: for N = 960 sphere points, it was ~1.8 times faster; for N = 96 sphere points, it was ~2.7 times faster. As it is seen from the above formula, the greater is n (and less N), the greater is speed up.
As you see, I did not delete or modify any of your code. Everything new was added. So if can either substitute your functions with mine, or leave it as it is: two functions, quadratic and linear.
Kind regards, Andrey Tokarev
P.S. I am a complete novice to GitHub. This is actually my first action here. So I am sorry for any possible missteps