Open MonZop opened 9 years ago
Also consider that there is a quite short space of min-max distance between the vertex point and the closest atom
Maybe use a proxy mesh, all vertices of which represent the cloud of atoms from the .pdb, the indices of the vertices will reflect the kind of atom. For example:
# keys would be generated once at runtime
index_mapper = {
0: "O", # 0 <= idx <= 200
201: "H", # 201 <= idx <= 320
321: "N" # 321 <= idx <= last index
}
In a K-dimensional Tree (KDTree) we add a list of vectors and indices. For every point on the MLP Surface mesh you can make a call to KDTree and get the index of the closest vertex, we then know the atom type.
This might not be the fastest solution, but it is the easiest to code and therefore becomes a benchmark to speed test subsequent iterations.
@MonZop
Also consider that there is a quite short space of min-max distance between the vertex point and the closest atom
OK, before starting to write code can you elaborate on which conditions this imposes?
@zeffii
Maybe use a proxy mesh, all vertices of which represent the cloud of atoms from the .pdb, the indices of the vertices will reflect the kind of atom.
The lipophilic value is not directly correspondent to the atom: it depends on how it is connected to other atoms. There is a library that specify the lipophilic value for every atom of all aminoacids and nucleic acids: this is the reference libray
For every point on the MLP Surface mesh ...
I think you mean the surface mesh: the MLP is not calculated yet (or did I not understand what you mean?)
@zeffii
OK, before starting to write code can you elaborate on which conditions this imposes?
The surface mesh is defined as 'Solvent Excluded Surface', also called 'Connolly surface', defined as the points of the surface of a spherical probe rolling on the VanderWaals atoms. Due to this definition, the min distance between a point of the mesh and an atom of the molecule is = the radius of the smallest atom; the max distance will be >= (radius of the largest atom+radius of the probe). All this might be irrelevant if the KDtree works in a way that does not imply exploring the neighborhood to look for the closest point of the 'atom cloud mesh'
right, this does introduce a slight detour. will digest..
Some throwaway code: https://gist.github.com/zeffii/2ddf35134d9f1026ead2 (just parking it here for reference)
Due to this definition, the min distance between a point of the mesh and an atom of the molecule is = the radius of the smallest atom; the max distance will be >= (radius of the largest atom+radius of the probe).
I see! do I understand this? :
Even when the nearest atom is found, we will need to know of that atom its position in the amino acid / nucleic acid of which it is part, then do the library lookup for the lipophilic value?
Yes, this info is in the columns 14-15 (atom) and 18-20 (aminoacid) of the PDB file. Then there is the fi values library
Will implement a static version first, then if that's acceptable consider the dynamic / animatable approach. Maybe later today, this evening.
cool, every atom is asigned info on BBInfo
attribute, f.ex
BBInfo : ATOM 232 CG ARG A 16 5.046 -1.171 -2.378 1.00 0.00 C
If this BBinfo can be used, it might be easy to substitute the Temp factor (last 0.00, column 61-66) with fi value, and then its easy to retrieve.
if BBinfo is never overwritten that might be a nice solution indeed, and easy. Alternatively (my preference) is to add a simple property specifically for fi
Ultimately -how- may not really matter for speed, then whatever is the most convenient to use will win.
some debug images :) In 06_1L2Y_4GE.pdb
more interim scripting: https://gist.github.com/zeffii/c864b5096bdc4a479703
If I understand the number on each atom is the fi value associated, right? If this works, I will test it on a few molecules on which the MLP is especially important, so we can evaluate the result
If I understand the number on each atom is the fi value associated, right?
Yep! from the _valuesfi lookup table, they are not yet remapped / normalized [-3, +1] -> [0,1]
. So far this is like administrative work..
Next step is to implement the KDTree search and design the algorithm to test element locations and determine the nearest element with respect to the elements' vdwaals radius. It's easy to limit the search area by passing a maximum search radius (probably maximum vdwaals radius in the pdb)
If this explanation is a bit confusing, hopefully the code will be more clear.
the red vertices indicate which vertices the search is limited to by passing a radius
@zeffii
passing a maximum search radius (probably maximum vdwaals radius in the pdb)
maximum search radius is = max VdW radius + probe radius
Right, i imagine there are vertices on the Connolly surface mesh which due to the blobby-ness of the surface are interpolation values and the nearest element is further away than the max VdW radius.
Do you mean the _proberadius as a buffer zone -- an extension of the search radius? If so, that makes sense to me.
(values are to indicate the concept only..)
here P One is a point on the Connolly surface which has no nearest neighbour if the search radius isn't extended
probe radius could be this progressive thing, which first searches 2 times the max VdW radius , in most cases this will probably find something , if not then I would find the parameters used for the surface blending radius , and use that + some fudge factor for numeric imprecision.
or perhaps
results = search_kdtree(mode=distance, distance=max_distance)
if not results:
results = search_kdtree(mode=nearest)
Either way. it will be interesting to see the results, seeing often points to pitfalls in an algorithm..
No ETA on this, i'll think about it and perhaps write code tonight.
Just a sec. I am making a figure that will explain better
Aplogies for the low qaulity of the drawing. Please note that the molecular surface defined is not by the center of the probe, but by its surface. The radius of the probe is typically 0.7 A, considered the radius of a water molecule (yes, we know that water is not spherical, but it is one of the approximations that everyone accepts). This radius is comparable to the VdW radius of most atoms, so the farthest possible point between is: r = probe radius (tipycally = or just smaller than d) d = atom radius Let's approximate r=d (staying a little large) M = d sqrt(3)
https://gist.github.com/zeffii/95e09d1e94feb2c23a8a mostly left to do is the choice + compare code now (interim code)
https://gist.github.com/zeffii/37de8fde581c4fd98243
I just noticed not all elements are in the scale_vdw
dictionary. for testing purposes the purple represents failed lookups for the radii of elements.
06_1L2Y_4GE.pdb
radii not found: {'OE1', 'ND2', 'CZ3', 'OXT', 'CD', 'CZ2', 'CD2', 'OD1', 'NE1', 'OH', 'NH2', 'NH1', 'OD2', 'CE2'
, 'CZ', 'CG', 'CB', 'NE', 'CG1', 'CD1', 'CG2', 'CE', 'NZ', 'NE2', 'CE1', 'CH2', 'OG'}
and
01_CaM_Protein3Conformations_wH.PDB
radii not found: {'NE', 'SD', 'OE1', 'OH', 'ND2', 'CZ', 'OD2', 'OE2', 'OG1', 'CD1', 'ND1', 'NH2', 'CG2', 'CD2',
'NE2', 'CG', 'OXT', 'OD1', 'CG1', 'OG', 'NH1', 'CE', 'CE2', 'CB', 'NZ', 'CD', 'CE1'}
not sure what this means but i'll figure it out.
please forgive in advance this, which is probably butchering chemistry terms..
These entities (NE, SD OE1...) need to be remapped back from specific element in the chain to the generic element , and take the vdw from the generic element?
https://gist.github.com/zeffii/fa8f7530d28dcaca4c7a functional.. now to test on larger files..
heh it's nice to have some time to dedicate to this.. it turns out
nothing within 2.7 of indexed surface vertex 1287
nothing within 2.7 of indexed surface vertex 2681
nothing within 2.7 of indexed surface vertex 4180
Where indexed surface vertex means a vertex on the Connolly surface. and 2.7 is the rescaled world distance. In some message earlier in the thread I said this might happen, and that the next search for proximity could be a simple 'find the closest' or 'perform an extended search'.
Do you have any preference here?
will add as separate python file in BioBlender folder, and import it and add the functionality to the Surface Panel, this will make testing easier. Think of a name for the button.
it can be tested in the MLP alternative branch
Hi, sorry for not follwing yesterday. I was in Bologna at ameeting , with very poor connection, barely enough to check the mail occasionally. I see there is quite some progress, I will need some time to think about it, however some indications can help immediately.
I just noticed not all elements are in the scale_vdw dictionary. for testing purposes the purple represents failed lookups for the radii of elements. 06_1L2Y_4GE.pdb radii not found: {'OE1', 'ND2', 'CZ3', 'OXT', 'CD', 'CZ2', 'CD2', 'OD1',
These entities (NE, SD OE1...) need to be remapped back from specific element in the chain to the generic element , and take the vdw from the generic element?
You are right: the atomic elemnt determines the VdW radius, and is in column 11 of the PDB (most biological elemnts are simple ones, defined by a single letter, with important exception), sometimes it is also reported in the last columns of the PBD ATOM line, where the exceptions become visible (Ca, Fe, Mg, etc). The MLP is more related to the connections of the atoms, that's why the position in the aminoacid (or DNA base) is the fastest way to attribute the value to the atom. Hoever, in my hands (BB 0.6) the DNA is 'painted' seems quite different form yours: I am not sure wht this is, will have to think about it. It looks something is wrong also with this DNA: I expect very dark on the 'crests', and more clear (lipophilic) in the valleys.
sometimes it is also reported in the last columns of the PBD ATOM
Right now the code assumes the presence of that last column, it was rather convenient - but you say this column is not guaranteed. then an alternative scheme is necessary.
@zeffii
will add as separate python file in BioBlender folder, and import it and add the functionality to the Surface Panel, this will make testing easier. Think of a name for the button.
What about 'Direct MLP mapping' (meaning that it is not medaited by integration), I need some time to consider if this is OK, it looks like integration is somehow necessary. (i have a deadline today, and cannot spend musch time on this, unfortunatley)
@MonZop No pressure from me. Button names are easy to change but deserve careful consideration. A fast-integration mode may be possible, if you can point at a source for the math involved. I can imagine routines but only from an aesthetic point of view :)
btw, i remove all lights from these renderings (even viewport views) to get a better idea of the flat colours of the vertex_col map. No lights means no shadows.
open-access is wonderful :) http://www.biomedcentral.com/1471-2105/10/32
Yes, indeed! We also published BioBlender as Open Access in the same journal. http://www.biomedcentral.com/1471-2105/13/S4/S16
I am just thinking of a possible different (and possibly faster) way to paint the MLP on the mesh: each vertex of the mesh is close to one atom in the underlying structure. Each one of these atoms has a single MLP value, ranging from -3 to +1, and normalized from 0 to 1. This can be seen as 'atomic MLP' Is it possible to attribute that value to the vertex?