New MLP attribution - Githubissues

MonZop commented 9 years ago

I am just thinking of a possible different (and possibly faster) way to paint the MLP on the mesh: each vertex of the mesh is close to one atom in the underlying structure. Each one of these atoms has a single MLP value, ranging from -3 to +1, and normalized from 0 to 1. This can be seen as 'atomic MLP' Is it possible to attribute that value to the vertex?

yes, probably, proximity between a point and all other points is a relatively fast operation using a k-dimensional tree

MonZop commented 9 years ago

Also consider that there is a quite short space of min-max distance between the vertex point and the closest atom

zeffii commented 9 years ago

Maybe use a proxy mesh, all vertices of which represent the cloud of atoms from the .pdb, the indices of the vertices will reflect the kind of atom. For example:

# keys would be generated once at runtime
index_mapper = {
    0: "O",     # 0 <= idx <= 200
    201: "H",   # 201 <= idx <= 320
    321: "N"    # 321 <= idx <= last index
}

In a K-dimensional Tree (KDTree) we add a list of vectors and indices. For every point on the MLP Surface mesh you can make a call to KDTree and get the index of the closest vertex, we then know the atom type.

This might not be the fastest solution, but it is the easiest to code and therefore becomes a benchmark to speed test subsequent iterations.

@MonZop

Also consider that there is a quite short space of min-max distance between the vertex point and the closest atom

OK, before starting to write code can you elaborate on which conditions this imposes?

MonZop commented 9 years ago

@zeffii

Maybe use a proxy mesh, all vertices of which represent the cloud of atoms from the .pdb, the indices of the vertices will reflect the kind of atom.

The lipophilic value is not directly correspondent to the atom: it depends on how it is connected to other atoms. There is a library that specify the lipophilic value for every atom of all aminoacids and nucleic acids: this is the reference libray

For every point on the MLP Surface mesh ...

I think you mean the surface mesh: the MLP is not calculated yet (or did I not understand what you mean?)

MonZop commented 9 years ago

@zeffii

OK, before starting to write code can you elaborate on which conditions this imposes?

The surface mesh is defined as 'Solvent Excluded Surface', also called 'Connolly surface', defined as the points of the surface of a spherical probe rolling on the VanderWaals atoms. molecularsurface Due to this definition, the min distance between a point of the mesh and an atom of the molecule is = the radius of the smallest atom; the max distance will be >= (radius of the largest atom+radius of the probe). All this might be irrelevant if the KDtree works in a way that does not imply exploring the neighborhood to look for the closest point of the 'atom cloud mesh'

zeffii commented 9 years ago

right, this does introduce a slight detour. will digest..

Some throwaway code: https://gist.github.com/zeffii/2ddf35134d9f1026ead2 (just parking it here for reference)

zeffii commented 9 years ago

Due to this definition, the min distance between a point of the mesh and an atom of the molecule is = the radius of the smallest atom; the max distance will be >= (radius of the largest atom+radius of the probe).

I see! do I understand this? :

Even when the nearest atom is found, we will need to know of that atom its position in the amino acid / nucleic acid of which it is part, then do the library lookup for the lipophilic value?

MonZop commented 9 years ago

Yes, this info is in the columns 14-15 (atom) and 18-20 (aminoacid) of the PDB file. Then there is the fi values library

zeffii commented 9 years ago

Will implement a static version first, then if that's acceptable consider the dynamic / animatable approach. Maybe later today, this evening.

zeffii commented 9 years ago

cool, every atom is asigned info on BBInfo attribute, f.ex BBInfo : ATOM 232 CG ARG A 16 5.046 -1.171 -2.378 1.00 0.00 C

MonZop commented 9 years ago

If this BBinfo can be used, it might be easy to substitute the Temp factor (last 0.00, column 61-66) with fi value, and then its easy to retrieve.

zeffii commented 9 years ago

if BBinfo is never overwritten that might be a nice solution indeed, and easy. Alternatively (my preference) is to add a simple property specifically for fi

Ultimately -how- may not really matter for speed, then whatever is the most convenient to use will win.

zeffii commented 9 years ago

some debug images :) In 06_1L2Y_4GE.pdb

zeffii commented 9 years ago

more interim scripting: https://gist.github.com/zeffii/c864b5096bdc4a479703

MonZop commented 9 years ago

If I understand the number on each atom is the fi value associated, right? If this works, I will test it on a few molecules on which the MLP is especially important, so we can evaluate the result

zeffii commented 9 years ago

If I understand the number on each atom is the fi value associated, right?

Yep! from the _valuesfi lookup table, they are not yet remapped / normalized [-3, +1] -> [0,1]. So far this is like administrative work..

Next step is to implement the KDTree search and design the algorithm to test element locations and determine the nearest element with respect to the elements' vdwaals radius. It's easy to limit the search area by passing a maximum search radius (probably maximum vdwaals radius in the pdb)

If this explanation is a bit confusing, hopefully the code will be more clear.

zeffii commented 9 years ago

the red vertices indicate which vertices the search is limited to by passing a radius radus

MonZop commented 9 years ago

@zeffii

passing a maximum search radius (probably maximum vdwaals radius in the pdb)

maximum search radius is = max VdW radius + probe radius

zeffii commented 9 years ago

Right, i imagine there are vertices on the Connolly surface mesh which due to the blobby-ness of the surface are interpolation values and the nearest element is further away than the max VdW radius.

Do you mean the _proberadius as a buffer zone -- an extension of the search radius? If so, that makes sense to me.

(values are to indicate the concept only..)

here P One is a point on the Connolly surface which has no nearest neighbour if the search radius isn't extended

zeffii commented 9 years ago

probe radius could be this progressive thing, which first searches 2 times the max VdW radius , in most cases this will probably find something , if not then I would find the parameters used for the surface blending radius , and use that + some fudge factor for numeric imprecision.

or perhaps

results = search_kdtree(mode=distance, distance=max_distance)
if not results:
   results = search_kdtree(mode=nearest)

Either way. it will be interesting to see the results, seeing often points to pitfalls in an algorithm..

No ETA on this, i'll think about it and perhaps write code tonight.

MonZop commented 9 years ago

Just a sec. I am making a figure that will explain better

MonZop commented 9 years ago

ses1

Aplogies for the low qaulity of the drawing. Please note that the molecular surface defined is not by the center of the probe, but by its surface. The radius of the probe is typically 0.7 A, considered the radius of a water molecule (yes, we know that water is not spherical, but it is one of the approximations that everyone accepts). This radius is comparable to the VdW radius of most atoms, so the farthest possible point between is: r = probe radius (tipycally = or just smaller than d) d = atom radius Let's approximate r=d (staying a little large) M = d sqrt(3)

zeffii commented 9 years ago

https://gist.github.com/zeffii/95e09d1e94feb2c23a8a mostly left to do is the choice + compare code now (interim code)

zeffii commented 9 years ago

https://gist.github.com/zeffii/37de8fde581c4fd98243 I just noticed not all elements are in the scale_vdw dictionary. for testing purposes the purple represents failed lookups for the radii of elements.

06_1L2Y_4GE.pdb
radii not found: {'OE1', 'ND2', 'CZ3', 'OXT', 'CD', 'CZ2', 'CD2', 'OD1', 'NE1', 'OH', 'NH2', 'NH1', 'OD2', 'CE2'
, 'CZ', 'CG', 'CB', 'NE', 'CG1', 'CD1', 'CG2', 'CE', 'NZ', 'NE2', 'CE1', 'CH2', 'OG'}

and

01_CaM_Protein3Conformations_wH.PDB
radii not found: {'NE', 'SD', 'OE1', 'OH', 'ND2', 'CZ', 'OD2', 'OE2', 'OG1', 'CD1', 'ND1', 'NH2', 'CG2', 'CD2',
'NE2', 'CG', 'OXT', 'OD1', 'CG1', 'OG', 'NH1', 'CE', 'CE2', 'CB', 'NZ', 'CD', 'CE1'}

not sure what this means but i'll figure it out.

zeffii commented 9 years ago

please forgive in advance this, which is probably butchering chemistry terms..

These entities (NE, SD OE1...) need to be remapped back from specific element in the chain to the generic element , and take the vdw from the generic element?

zeffii commented 9 years ago

https://gist.github.com/zeffii/fa8f7530d28dcaca4c7a functional.. now to test on larger files..

zeffii commented 9 years ago

heh it's nice to have some time to dedicate to this.. it turns out

nothing within 2.7 of indexed surface vertex 1287
nothing within 2.7 of indexed surface vertex 2681
nothing within 2.7 of indexed surface vertex 4180

Where indexed surface vertex means a vertex on the Connolly surface. and 2.7 is the rescaled world distance. In some message earlier in the thread I said this might happen, and that the next search for proximity could be a simple 'find the closest' or 'perform an extended search'.

Do you have any preference here?

zeffii commented 9 years ago

This is all I can do for today

jbg (motion gif)

zeffii commented 9 years ago

will add as separate python file in BioBlender folder, and import it and add the functionality to the Surface Panel, this will make testing easier. Think of a name for the button.

zeffii commented 9 years ago

it can be tested in the MLP alternative branch bluefork

MonZop commented 9 years ago

Hi, sorry for not follwing yesterday. I was in Bologna at ameeting , with very poor connection, barely enough to check the mail occasionally. I see there is quite some progress, I will need some time to think about it, however some indications can help immediately.

I just noticed not all elements are in the scale_vdw dictionary. for testing purposes the purple represents failed lookups for the radii of elements. 06_1L2Y_4GE.pdb radii not found: {'OE1', 'ND2', 'CZ3', 'OXT', 'CD', 'CZ2', 'CD2', 'OD1',

These entities (NE, SD OE1...) need to be remapped back from specific element in the chain to the generic element , and take the vdw from the generic element?

You are right: the atomic elemnt determines the VdW radius, and is in column 11 of the PDB (most biological elemnts are simple ones, defined by a single letter, with important exception), sometimes it is also reported in the last columns of the PBD ATOM line, where the exceptions become visible (Ca, Fe, Mg, etc). The MLP is more related to the connections of the atoms, that's why the position in the aminoacid (or DNA base) is the fastest way to attribute the value to the atom. Hoever, in my hands (BB 0.6) the DNA is 'painted' seems quite different form yours: dna_mlp I am not sure wht this is, will have to think about it. It looks something is wrong also with this DNA: I expect very dark on the 'crests', and more clear (lipophilic) in the valleys.

zeffii commented 9 years ago

sometimes it is also reported in the last columns of the PBD ATOM

Right now the code assumes the presence of that last column, it was rather convenient - but you say this column is not guaranteed. then an alternative scheme is necessary. toasty

MonZop commented 9 years ago

@zeffii

will add as separate python file in BioBlender folder, and import it and add the functionality to the Surface Panel, this will make testing easier. Think of a name for the button.

What about 'Direct MLP mapping' (meaning that it is not medaited by integration), I need some time to consider if this is OK, it looks like integration is somehow necessary. (i have a deadline today, and cannot spend musch time on this, unfortunatley)

zeffii commented 9 years ago

@MonZop No pressure from me. Button names are easy to change but deserve careful consideration. A fast-integration mode may be possible, if you can point at a source for the math involved. I can imagine routines but only from an aesthetic point of view :)

zeffii commented 9 years ago

btw, i remove all lights from these renderings (even viewport views) to get a better idea of the flat colours of the vertex_col map. No lights means no shadows.

zeffii commented 9 years ago

open-access is wonderful :) http://www.biomedcentral.com/1471-2105/10/32

MonZop commented 9 years ago

Yes, indeed! We also published BioBlender as Open Access in the same journal. http://www.biomedcentral.com/1471-2105/13/S4/S16

MonZop / BioBlender

New MLP attribution #22