Closed GoogleCodeExporter closed 9 years ago
Hi Richard,
Overall your new topology framework sounds good. I particularly like how one
can query topology informations from AtomGroup objects (but I am also confused,
see below (2)). That fits really well into the overall object-oriented design.
Hiding TopologyDict is also a Good Thing.
Concerns:
1) I really don't want to make loading a universe slower than it is already.
Have you tried this with a 200,000 atom system with lots of water? How does it
handle typical water models in particular? Could we make the generation of the
master lists lazy?
2) I don't understand why you need to index u.bonds[ag] in order to get all the
bonds that involve the atoms of ag. How is this different from ag.bonds ? If
the difference is that the latter is only bonds WITHIN the group then I rather
add a method such as ag.get_bonds() that returns either ag.bonds or
u.bonds[ag], maybe based on a keyword internal=True|False. Similar get_angles()
and get_torsions().
3) We should agree on using either degrees or radians in MDAnalysis; I think
most functions and methods use degrees and this should all be consistent. (If
speed is an issue and you don't want the deg2rad conversion, add a keyword such
as unit="degrees" | "radians" or radians=False.)
4) What current code and tests do these changes break?
Don't add this code in develop right away but open a feature branch.
Oliver
Original comment by orbeckst
on 31 Aug 2014 at 5:20
Hi Oliver, thanks for the feedback.
To quickly clarify
re 1) Universe loading isn't longer with the changes, but loading topology in
general does seem to take longer than it should.
I'm going to profile and see where all our time is going, making master lists
lazy is definitely possible.
2) ag.bonds and u.bonds[ag] are identical (this is how ag builds .bonds)
I think the problem here is that all AtomGroups build from u.bonds so if I did
ag = u.atoms[:100]
ag2 = ag[:10]
ag.bonds[ag2]
would be much faster than
ag2.bonds
as this would search u.bonds rather than the shorter ag.bonds.
This is another performance thing I'm looking into, ie is there some way to
remember the parent AtomGroup so I can search through a smaller list.
3) When I wrote these I assumed that most other trig functions seemed to return
rads, (eg np.sin). I'll look through and see what the norm is within MDA,
adding flags/keywords where necessary.
4) I did a git grep to try and see if this breaks anything else in the package,
I don't think it does, but I'll double check this.
It will break anything old code that used the previous system (ag.bondDict
ag.angleDict etc)
Also any code that used u.bonds as a list, this is easily fixed by using either
list(u.bonds) or u._bondlist
Original comment by richardjgowers
on 31 Aug 2014 at 9:07
Ok pushed a new branch 'feature-improvedtopology'
Bond, angle and torsion information is built lazily into Universe, so the only
difference in speed for building universes is only from parsing the extra lines
in the .psf file. This change should also make loading Universes much faster
than it currently is.
I reworked fetching bonds based on AtomGroups, it's now called
tg.atomgroup_intersection(ag). This accepts a keyword 'strict' to make it only
return bonds which are entirely in the AtomGroup.
ie
ag = u.atoms[20:30] # region of interest
tg = u.torsions[('C', 'Ca', 'C', 'C')] # torsion of interest
tg.bond_intersection(ag) # returns any torsions from this tg which at least 1
atom from ag features in
tg.bond_intersection(ag, strict=True) # returns any torsions where all 4 atoms
are in ag
I don't think this breaks anything, I've updated+expanded tests; everything
should pass.
Original comment by richardjgowers
on 1 Sep 2014 at 11:40
Original comment by richardjgowers
on 17 Sep 2014 at 9:33
Original issue reported on code.google.com by
richardjgowers
on 30 Aug 2014 at 11:34