MDAnalysis / GridDataFormats

GridDataFormats is a pure Python library to handle data on a regular grid using commonly used file formats in molecular simulations.
https://mdanalysis.org/GridDataFormats
GNU Lesser General Public License v3.0
29 stars 18 forks source link

Centers fucntion is not giving the right coordinates #77

Open acruzpr opened 4 years ago

acruzpr commented 4 years ago

Using the test.dx.gz file:

# OpenDX density file written by gridDataFormats.Grid.export() # File format: http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm#HDREDF # Data are embedded in the header and tied to the grid positions. # Data is written in C array order: In grid[x,y,z] the axis z is fastest # varying, then y, then finally x, i.e. z is the innermost loop. # (Note: the VMD dx-reader chokes on comments below this line) object 1 class gridpositions counts 2 2 2 origin 20.100000 3.000000 -10.000000 delta 1.000000 0.000000 0.000000 delta 0.000000 1.000000 0.000000 delta 0.000000 0.000000 1.000000 object 2 class gridconnections counts 2 2 2 object 3 class array type "double" rank 0 items 8 data follows 1.000000000000000 1.000000000000000 1.000000000000000 1.000000000000000 0.000001000000000 -1000000.000000000000000 1.000000000000000 1.000000000000000 attribute "dep" string "positions" object "density" class field component "positions" value 1 component "connections" value 2 component "data" value 3

The coordinates of the first center are the same coordinates of the origin.

from gridData import Grid
g = Grid('test.dx')
g.origin
array([ 20.1,   3. , -10. ])

for i in g.centers():
    print(i)

[ 20.1   3.  -10. ]
[20.1  3.  -9. ]
[ 20.1   4.  -10. ]
[20.1  4.  -9. ]
[ 21.1   3.  -10. ]
[21.1  3.  -9. ]
[ 21.1   4.  -10. ]
[21.1  4.  -9. ]

I think that all the coordinates for the centers should be displaced (self.delta * 0.5). This is right or I misinterpreted the function?

orbeckst commented 4 years ago

No, I don't think so. The origin itself is already the center of cell with index (0, 0, 0). To get to the next center, you need to add dx.

Or do you have any indication that origin is not a center itself?

acruzpr commented 4 years ago

Look at this: https://www.ks.uiuc.edu/Research/vmd/plugins/molfile/dxplugin.html or https://apbs-pdb2pqr.readthedocs.io/en/latest/formats/opendx.html#opendx

giacomofiorin commented 4 years ago

Those links explain that origin is followed by the coordinates of the left-most grid point in each dimension. The best "point" representation of the corresponding grid bin is its center, not one of its corners.

I have not run APBS in a very long time, but if you produce a .dx file with the volmap VMD tool: http://www.ks.uiuc.edu/Research/vmd/vmd-1.9.3/ug/node156.html using -10 ... +10 as a range for each axis and spacing 1.0 you get the following:

# Data calculated by the VMD volmap function
object 1 class gridpositions counts 20 20 20
origin -9.5 -9.5 -9.5
delta 1 0 0
delta 0 1 0
delta 0 0 1
object 2 class gridconnections counts 20 20 20
object 3 class array type double rank 0 items 8000 data follows
...

If I wanted the first grid "point" to be at x = -10, I would supply a minimum of -10.5 and a maximum of 10.5, for a total of 21 points in each dimension.

orbeckst commented 4 years ago

If you look in the original DX specs (via archive) https://web.archive.org/web/20080808140524/http://opendx.sdsc.edu/docs/html/pages/usrgu068.htm under

gridpositions Keyword

The gridpositions keyword is used to represent an n-dimensional grid of geometrically regular points in a compact form. It is a kind of Array Object and can be used in any context where an Array Object would be used. It is typically used as a regular positions component. The shape of the grid (number of points in each dimension) is specified by a list of n numbers following the optional counts keyword in the object clause. The number n of items in this list determines the dimensionality of the grid. The last item in this list corresponds to the fastest varying dimension.

A grid has an origin, which can be specified by an origin clause (which lists the n coordinates of the origin). If the origin clause is not present, the origin defaults to 0. The origin clause can be followed by n delta clauses, listing the deltas for each dimension. Each delta clause has n elements. The last delta clause corresponds to the fastest varying dimension. "Example 1. A Regular Grid" shows how to use the delta clause to specify a grid in which z varies fastest. If the delta clauses are not specified, the deltas default to unit vectors in each dimension, with the last dimension varying fastest.

and then

Regular Array Objects

A Regular Array Object is a compact encoding of a linear sequence of equally spaced points in n-space. ... A Regular Array Object is a linear sequence of points in n-space starting at some origin, specified by the origin clause, and separated by some constant delta, specified by the delta clause.

I read it that the entries in a DX file gridpositions are points on a lattice that is centered on the origin. They are not the boundaries of cells, at least in the original DX specs.

In principle the correct interpretation for volumetric data should be that origin and origin + n*delta are the centers of volume elements.

Now, APBS has defined it differently

origin xmin ymin zmin: The coordinates of the grid lower corner

which I would argue violates the original DX specifications.

Nevertheless, I understand that probably nobody who uses GridDataFormats cares about the original DX specs and most people care about reading APBS DX files and therefore users of GridDataFormats would prefer that we interpret APBS-style DX formats correctly.

Given that I am also a user I am ok with breaking the original DX specs interpretation but we have to make sure that there's a way to get the "true" DX specs behavior if needed.

I will comment on your PR #78 .

orbeckst commented 4 years ago

@giacomofiorin thanks, could you comment on PR #78 ... I didn't see your comment while I was writing mine.

orbeckst commented 4 years ago

@sobolevnrm we have a question regarding how APBS interprets OpenDX, in particular the

origin xmin ymin zmin

parameter. The APBS docs state

xmin ymin zmin The coordinates of the grid lower corner

Does this mean

  1. the center of the "grid lower corner" cell/volume element is located at (xmin, ymin, zmin) (and then the left edge is at xmin - hx/2 etc), or
  2. the lower left front edge is located at (xmin, ymin, zmin) (and then the center of the cell is at xmin+hx/2, ymin+hy/2, zmin+hz/2)?

Thank you for your insights.

sobolevnrm commented 4 years ago

Hi -- Interpretation (2) should be correct. Is this what you're observing?

On Thu, May 14, 2020 at 1:31 AM Oliver Beckstein notifications@github.com wrote:

@sobolevnrm https://github.com/sobolevnrm we have a question regarding how APBS interprets OpenDX, in particular the

origin xmin ymin zmin

parameter. The APBS docs https://apbs-pdb2pqr.readthedocs.io/en/latest/formats/opendx.html#opendx state

xmin ymin zmin The coordinates of the grid lower corner

Does this mean

  1. the center of the "grid lower corner" cell/volume element is located at (xmin, ymin, zmin) (and then the left edge is at xmin - hx/2 etc), or
  2. the lower left front edge is located at (xmin, ymin, zmin) (and then the center of the cell is at xmin+hx/2, ymin+hy/2, zmin+hz/2)?

Thank you for your insights.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-628480799, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WCHVOSYMJ56GZJFQGLRROT43ANCNFSM4MKR53VA .

orbeckst commented 4 years ago

Does this mean

  • the center of the "grid lower corner" cell/volume element is located at (xmin, ymin, zmin) (and then the left edge is at xmin - hx/2 etc), or
  • the lower left front edge is located at (xmin, ymin, zmin) (and then the center of the cell is at xmin+hx/2, ymin+hy/2, zmin+hz/2)?

Hi -- Interpretation (2) should be correct. Is this what you're observing?

At the moment there's no observation. @acruzpr stated that the APBS docs seem to imply interpretation (2) (and I agreed with their reading https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-615534026 although @giacomofiorin was more dubious https://github.com/MDAnalysis/GridDataFormats/pull/78#issuecomment-615537669 ).

@giacomofiorin (see https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-615533350) and I (https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-615534026) think that that this not the common interpretation of the DX file, according to specs and what VMD thinks.

Can you think of a good way to test APBS's interpretation or is there an obvious place in the source code to look at?

sobolevnrm commented 4 years ago

Hi All --

These should be good references:

https://github.com/Electrostatics/apbs-pdb2pqr/blob/master/apbs/src/mg/vgrid.c#L586 https://github.com/Electrostatics/apbs-pdb2pqr/blob/master/apbs/src/mg/vgrid.c#L1206

Please let me know if you have any questions.

Thanks,

Nathan

On Mon, May 18, 2020 at 9:57 AM Oliver Beckstein notifications@github.com wrote:

Does this mean

  • the center of the "grid lower corner" cell/volume element is located at (xmin, ymin, zmin) (and then the left edge is at xmin - hx/2 etc), or
  • the lower left front edge is located at (xmin, ymin, zmin) (and then the center of the cell is at xmin+hx/2, ymin+hy/2, zmin+hz/2)?

Hi -- Interpretation (2) should be correct. Is this what you're observing?

At the moment there's no observation. @acruzpr https://github.com/acruzpr stated that the APBS docs seem to imply interpretation (2) (and I agreed with their reading #77 (comment) https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-615534026 although @giacomofiorin https://github.com/giacomofiorin was more dubious #78 (comment) https://github.com/MDAnalysis/GridDataFormats/pull/78#issuecomment-615537669 ).

@giacomofiorin https://github.com/giacomofiorin (see #77 (comment) https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-615533350) and I (#77 (comment) https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-615534026) think that that this not the common interpretation of the DX file, according to specs and what VMD thinks.

Can you think of a good way to test APBS's interpretation or is there an obvious place in the source code to look at?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/GridDataFormats/issues/77#issuecomment-630311584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOX7WFCZ5DCCLLK5DW6ZATRSFSGXANCNFSM4MKR53VA .

giacomofiorin commented 4 years ago

@orbeckst in reading the ABPS doc and the comments by @sobolevnrm please remember the recommended number of grid points in ABPS, dime, is a power of 2 plus 1: https://apbs-pdb2pqr.readthedocs.io/en/latest/apbs/input/elec/dime.html#dime

Let's make an example in 1D for simplicity. If I specify a domain length glen of 20 Å, and a dime of 8+1 = 9 points, this creates the following axis:

-10.0 -7.5 -5.0 -2.5 0.0 2.5 5.0 7.5 10.0

and APBS outputs the potential sampled at each one of these points, and the corresponding DX file has an origin at -10.0 and a delta of 2.5. It is true that the left-most edge of the APBS potential is x = -10.0, because there is no information for x < -10.0. Nor, for that matter, for x > 10.0 or for any other x-value in between the given points. Each value in the DX file represents the potential at that particular point and nothing else:

 φ(x = -10.0) φ(x = -7.5) ... φ(x = 0.0) ... φ(x = 7.5) φ(x = 10.0)

@sobolevnrm Please correct me if needed.

Now, when the map represents a histogram, such as what volmap computes, the values of the map come from data aggregated over whole intervals, not individual points. An OpenDX file with exactly the same header as above but produced from volmap instead of APBS, would represent the number density n(...) computed across whole intervals of x:

n(-11.25 < x < -8.75) n(-8.75 < x < -6.25) ... n(-1.25 < x < 1.25) ... n(6.25 < x < 8.75) n(8.75 < x < 11.25)

(Feel free to replace of course the < sign with a ≤ sign if you want: I hope we agree that numerically it would hardly make a difference). It is important to note that the physical domain covered by that map starts at x = -11.25, because any x values between -11.25 and -10.0 will be counted in the first bin.

Both use cases make up for an identical OpenDX header, but the physical or mathematical quantities represented by the data file are different, and this fact is not encoded by the format. Confusion arises primarily because the odd point added in the APBS calculation makes the extrema become "round" numbers.

As a last counter-example, a histogram that covers the exact range [-10.0:10.0] and has the same spacing of 2.5 would have to have 8 bins, with the following X axis:

-8.75 -6.25 -3.75 -1.25 1.25 3.75 6.25 8.75 

and the origin in the OpenDX file would be -8.75.

orbeckst commented 1 year ago

Most of the links from the above discussion are broken so here are equivalent links: