COMCIFS / cif_core

The IUCr CIF core dictionary
15 stars 9 forks source link

Units for B (the atomic displacement parameter)? #422

Open rowlesmr opened 1 year ago

rowlesmr commented 1 year ago

Should the units for B (and the anisotropic versions) be given as 8pi_angstroms_squared? It is an option in the units_code enumeration.

vaitkus commented 1 year ago

The mmCIF dictionary [1] also assigns the 8pi_angstroms_squared units to their B items (e.g. _atom_site_anisotrop.B[1][1]). Some of those items are supposed to be equivalent to those in the CIF_CORE dictionary (and they do share some of the data name aliases). However, in my opinion the units should not be changed (see below).

The units themselves look incorrect to me (at least in the context of the dictionary). The current definition states:

8pi2_angstroms_squared          "8pi^2^ * angstroms squared (metres * 10^(-10)^)^2^" 

From what I understand, the dictionary currently operates under the assumption of the following equivalence [2] (brackets indicate the measuring units): U [A^2] = B / 8 pi^2 [A^2]

Now if we suddenly change the units of B to (8pi^2 A^2) and try to convert the right side to A^2: U [A^2] = B / 8 pi^2^ [8pi^2^ A^2] U [A^2] = ( B / 8 pi^2 ) 8pi^2 [A^2] U [A^2] = B [A^2]

That is, if the units were actually respected, one would expect to see identical numeric ADP values regardless of whether U or B is used. However, since CIF files normally contain B values in the same form as written in the related journal publication (that is, not converted to 8pi^2 * A^2 units) this will not be true for most (if not all) existing CIF files.

Note, that the dREL code in the CIF_CORE dictionary responsible for converting between U and B also currently operates under the assumption that B values are measured in A^2^ and not 8pi^2 * A^2, so the change in units would not be so simple.

@rowlesmr , @jamesrhester could you check if this interpretation makes sense? If yes, I would also report this to the mmCIF maintainers since in practice they probably also do not divide their B values by 8pi^2 just to express them in the
[8pi^2 * A^2] units.

[1] https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_pdbx_v50.dic [2] http://pd.chem.ucl.ac.uk/pdnn/refine3/adps.htm

rowlesmr commented 1 year ago

I concur with this assessment (and am embarassed I didn't spot it!)

It's like reporting length in metres and kilometres, and then just artibtrarily changing the units on the kilometres value to be metres; the magnitudes get stuffed up.

rowlesmr commented 1 year ago

or you could write:

0.01267 (Ų) = 1 (1/(8π²) Ų)

so you can make the unit_code correct if you tweak it's definition:

'8pi_angstroms_squared'   "length_squared '(1/(8pi^2)) * angstroms squared (metres * 10^(-10))^2'"

assuming that you give B as B and U as U as in the example a coupld of lines above.

vaitkus commented 1 year ago

I think this still does not remove the underlying issue of U and B being expressed in different units. In this case one still gets:

U [A^2] = B / 8 pi^2^ [1/(8pi^2^) A^2] U [A^2] = B / 8 pi^2^ 1/(8pi^2^) [A^2]

Which is different from the expected:

U [A^2] = B / (8 pi^2^) [A^2]

However, just in case I wrote an email to the mmCIF dictionary maintainers to get their input/interpretation on this issue. It would be quite unexpected to hear that they actually divide all of their B values by 8pi^2^ to convert them to the 8pi_angstroms_squared units.

rowlesmr commented 1 year ago

My example above is exactly this:

U [A^2] = B / (8 pi^2^) [A^2]

Move the divisor into the units

U [A^2] = B [1 / (8 pi^2^) A^2]

vaitkus commented 1 year ago

Yes, but moving the divisor into the units affects the value. That is, if the original publication states that the B value is 8.5 [A^2^], the author would now have to record it in the CIF file as 8.5 * 8 pi^2^ to properly express it in the required [1/ (8 pi^2^) A^2^] units., would they not? I'm quite sure this convention was never followed in CIF files, thus suddenly changing the units would be change the interpretation of the B values.

I am not sure about the conventions of mmCIF files though.

rowlesmr commented 1 year ago

I'm arguing that the 1/8pi^2 constant has always implicitly been there. Adding it explicitly doesn't change anything, apart from making it explicit.

Consider measuring lengths. The value W is recorded in metres. There is also another value, Y, that people use, such that W = Y/1000. In both cases, the unit of length is metres. I can report W = 0.001 or Y=1, and people know it is the same thing.

There is an implicit divisor in the units; W [m] = Y [1/1000 m]

I don't need to change the magnitude of the Y value, I just need to make the implicit divisor explicit.

vaitkus commented 1 year ago

According to the mmCIF unit conversion table, angstroms_squared can be converted to 8pi2_angstroms_squared by multiplying it by 8pi^2:

angstroms_squared               8pi2_angstroms_squared          *  78.9568

I do not see, why the same would not apply to fractional multipliers.

rowlesmr commented 1 year ago

The unit conversion agress with what I said

From Units&Identifier     |     To Units&Identifier        | Operator | Conversion Factor
angstroms_squared               8pi2_angstroms_squared          *         78.9568

To convert U value to a B value, multiply by 79 and conversely, to convert a B value to a U value, divide by 79.

This means that the explanation of the 8pi2_angstroms_squared unit in units_code is wrong, and should be "length_squared '(1/(8pi^2)) * angstroms squared (metres * 10^(-10))^2'", and we can just straightup change the units for all B-related dataitems.

vaitkus commented 1 year ago

I think we are talking past each other here. You do see why changing the units of B would be problematic in general?

Conversion from B to U is independent of conversion between "8pi2_angstroms_squared" and "angstroms_squared". Historically B values in CIF files have been expressed in "angstroms_squared" with the 8pi^2^ multiplier included into the value. Now, if one would like to convert the existing B values from "angstroms_squared" to "8pi2_angstroms_squared" how would the numeric representation of the B value change?

For example, we have B value of 13 [A^2^]. How would it look expressed in [ 8pi^2^ * A^2^]? Clearly, it cannot remain the same since we are adding a multiplier (or a divisor).

rowlesmr commented 1 year ago

I've just worked through a bunch of stuff, and it works in isolation, but not in functions involving B or U, assuming you're using functions specialised on B or U, so my suggestions are all wrong.

Then I had a look at mmCIF...

Looking at 3i40, the first few atom coords are given as

loop_ #slightly edited by removing columns
_atom_site.type_symbol 
_atom_site.label_atom_id 
_atom_site.Cartn_x 
_atom_site.Cartn_y 
_atom_site.Cartn_z 
_atom_site.occupancy 
_atom_site.B_iso_or_equiv 
N   N    -27.279   6.238   -12.314   1.00   45.01 
C   CA   -26.249   6.028   -11.313   1.00   43.47 
C   C    -25.582   4.677   -11.471   1.00   34.37 
...

There is a B value of 45.01, which is in units of (I'm hoping) 8*Pi^2^ Å^2^.

Which you then need to divide by 64*Pi^4^ to get to U (0.00722 Å^2^). Why does mmCIF want to impose this extra level of division?

Unless, truely, the B is 45.01 Å^2^ (U = 0.57 Å^2^), which is an unheard of magnitude in the inorganic space in which I'm used to living.

vaitkus commented 1 year ago

B value of 45.01 Å^2^ seems a bit high, but it is still within the same magnitude of what could be expected in protein crystal structures according to this paper [1]:

Based on the analysis of a large and well selected set of protein crystal structures, it can be predicted that at very high resolution (better than 1.5 Å), B_max is close to 25 Å^2, which means that the average B-factor value should not be larger than 25 Å^2 at that resolution, while larger values are observed at lower resolution. At very low resolution (worse than 3.3 Å), B_max grows up to 80 Å^2, which means, again, that the average B-factor value should not be larger than 80 Å^2 at that resolution.

Structure 3i40 has the resolution of 1.85 Å and the mean Biso of 30.297 Å^2.

One additional thing that I noticed, is that a multitude of single-valued B data items, such as _reflns.B_iso_Wilson_estimate, _refine.B_iso_mean, _refine.B_iso_min and _refine.B_iso_max, are assigned the Å^2 unit in the dictionary and only the looped _atom_site.* B items are assigned the 8Pi^2 Å^2 unit. However, in the 3i40 file, the values of _refine.B_iso_min and _refine.B_iso_max are numerically identical to the lowest and highest values of the looped _atom_site.B_iso_or_equiv data item. Clearly, something is incorrect here and my bet is that the 8pi^2 * Å^2 unit is the one out of place.

[1] https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2083-8

jamesrhester commented 1 year ago

I agree with @vaitkus here. I'm not sure what's going on with mmCIF, so it will be interesting to hear what the response is to the email.