Open rowlesmr opened 1 year ago
The mmCIF dictionary [1] also assigns the 8pi_angstroms_squared
units to their B items (e.g. _atom_site_anisotrop.B[1][1]
). Some of those items are supposed to be equivalent to those in the CIF_CORE
dictionary (and they do share some of the data name aliases). However, in my opinion the units should not be changed (see below).
The units themselves look incorrect to me (at least in the context of the dictionary). The current definition states:
8pi2_angstroms_squared "8pi^2^ * angstroms squared (metres * 10^(-10)^)^2^"
From what I understand, the dictionary currently operates under the assumption of the following equivalence [2] (brackets indicate the measuring units): U [A^2] = B / 8 pi^2 [A^2]
Now if we suddenly change the units of B to (8pi^2 A^2) and try to convert the right side to A^2: U [A^2] = B / 8 pi^2^ [8pi^2^ A^2] U [A^2] = ( B / 8 pi^2 ) 8pi^2 [A^2] U [A^2] = B [A^2]
That is, if the units were actually respected, one would expect to see identical numeric ADP values regardless of whether U or B is used. However, since CIF files normally contain B values in the same form as written in the related journal publication (that is, not converted to 8pi^2 * A^2 units) this will not be true for most (if not all) existing CIF files.
Note, that the dREL code in the CIF_CORE dictionary responsible for converting between U and B also currently operates under the assumption that B values are measured in A^2^ and not 8pi^2 * A^2, so the change in units would not be so simple.
@rowlesmr , @jamesrhester could you check if this interpretation makes sense? If yes, I would also report this to the mmCIF maintainers since in practice they probably also do not divide their B values by 8pi^2 just to express them in the
[8pi^2 * A^2] units.
[1] https://mmcif.wwpdb.org/dictionaries/ascii/mmcif_pdbx_v50.dic [2] http://pd.chem.ucl.ac.uk/pdnn/refine3/adps.htm
I concur with this assessment (and am embarassed I didn't spot it!)
It's like reporting length in metres and kilometres, and then just artibtrarily changing the units on the kilometres value to be metres; the magnitudes get stuffed up.
or you could write:
0.01267 (Ų) = 1 (1/(8π²) Ų)
so you can make the unit_code correct if you tweak it's definition:
'8pi_angstroms_squared' "length_squared '(1/(8pi^2)) * angstroms squared (metres * 10^(-10))^2'"
assuming that you give B as B and U as U as in the example a coupld of lines above.
I think this still does not remove the underlying issue of U and B being expressed in different units. In this case one still gets:
U [A^2] = B / 8 pi^2^ [1/(8pi^2^) A^2] U [A^2] = B / 8 pi^2^ 1/(8pi^2^) [A^2]
Which is different from the expected:
U [A^2] = B / (8 pi^2^) [A^2]
However, just in case I wrote an email to the mmCIF dictionary maintainers to get their input/interpretation on this issue. It would be quite unexpected to hear that they actually divide all of their B values by 8pi^2^ to convert them to the 8pi_angstroms_squared
units.
My example above is exactly this:
U [A^2] = B / (8 pi^2^) [A^2]
Move the divisor into the units
U [A^2] = B [1 / (8 pi^2^) A^2]
Yes, but moving the divisor into the units affects the value. That is, if the original publication states that the B value is 8.5 [A^2^], the author would now have to record it in the CIF file as 8.5 * 8 pi^2^
to properly express it in the required [1/ (8 pi^2^) A^2^] units., would they not? I'm quite sure this convention was never followed in CIF files, thus suddenly changing the units would be change the interpretation of the B values.
I am not sure about the conventions of mmCIF files though.
I'm arguing that the 1/8pi^2 constant has always implicitly been there. Adding it explicitly doesn't change anything, apart from making it explicit.
Consider measuring lengths. The value W is recorded in metres. There is also another value, Y, that people use, such that W = Y/1000. In both cases, the unit of length is metres. I can report W = 0.001 or Y=1, and people know it is the same thing.
There is an implicit divisor in the units; W [m] = Y [1/1000 m]
I don't need to change the magnitude of the Y value, I just need to make the implicit divisor explicit.
According to the mmCIF unit conversion table, angstroms_squared
can be converted to 8pi2_angstroms_squared
by multiplying it by 8pi^2:
angstroms_squared 8pi2_angstroms_squared * 78.9568
I do not see, why the same would not apply to fractional multipliers.
The unit conversion agress with what I said
From Units&Identifier | To Units&Identifier | Operator | Conversion Factor
angstroms_squared 8pi2_angstroms_squared * 78.9568
To convert U value to a B value, multiply by 79 and conversely, to convert a B value to a U value, divide by 79.
This means that the explanation of the 8pi2_angstroms_squared
unit in units_code
is wrong, and should be "length_squared '(1/(8pi^2)) * angstroms squared (metres * 10^(-10))^2'"
, and we can just straightup change the units for all B
-related dataitems.
I think we are talking past each other here. You do see why changing the units of B would be problematic in general?
Conversion from B to U is independent of conversion between "8pi2_angstroms_squared" and "angstroms_squared". Historically B values in CIF files have been expressed in "angstroms_squared" with the 8pi^2^ multiplier included into the value. Now, if one would like to convert the existing B values from "angstroms_squared" to "8pi2_angstroms_squared" how would the numeric representation of the B value change?
For example, we have B value of 13 [A^2^]. How would it look expressed in [ 8pi^2^ * A^2^]? Clearly, it cannot remain the same since we are adding a multiplier (or a divisor).
I've just worked through a bunch of stuff, and it works in isolation, but not in functions involving B or U, assuming you're using functions specialised on B or U, so my suggestions are all wrong.
Then I had a look at mmCIF...
Looking at 3i40, the first few atom coords are given as
loop_ #slightly edited by removing columns
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
N N -27.279 6.238 -12.314 1.00 45.01
C CA -26.249 6.028 -11.313 1.00 43.47
C C -25.582 4.677 -11.471 1.00 34.37
...
There is a B value of 45.01, which is in units of (I'm hoping) 8*Pi^2^ Å^2^.
Which you then need to divide by 64*Pi^4^ to get to U (0.00722 Å^2^). Why does mmCIF want to impose this extra level of division?
Unless, truely, the B is 45.01 Å^2^ (U = 0.57 Å^2^), which is an unheard of magnitude in the inorganic space in which I'm used to living.
B value of 45.01 Å^2^ seems a bit high, but it is still within the same magnitude of what could be expected in protein crystal structures according to this paper [1]:
Based on the analysis of a large and well selected set of protein crystal structures, it can be predicted that at very high resolution (better than 1.5 Å), B_max is close to 25 Å^2, which means that the average B-factor value should not be larger than 25 Å^2 at that resolution, while larger values are observed at lower resolution. At very low resolution (worse than 3.3 Å), B_max grows up to 80 Å^2, which means, again, that the average B-factor value should not be larger than 80 Å^2 at that resolution.
Structure 3i40 has the resolution of 1.85 Å and the mean Biso of 30.297 Å^2.
One additional thing that I noticed, is that a multitude of single-valued B data items, such as _reflns.B_iso_Wilson_estimate
, _refine.B_iso_mean
, _refine.B_iso_min
and _refine.B_iso_max
, are assigned the Å^2 unit in the dictionary and only the looped _atom_site.*
B items are assigned the 8Pi^2 Å^2 unit. However, in the 3i40 file, the values of _refine.B_iso_min
and _refine.B_iso_max
are numerically identical to the lowest and highest values of the looped _atom_site.B_iso_or_equiv
data item. Clearly, something is incorrect here and my bet is that the 8pi^2 * Å^2 unit is the one out of place.
[1] https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2083-8
I agree with @vaitkus here. I'm not sure what's going on with mmCIF, so it will be interesting to hear what the response is to the email.
Should the units for B (and the anisotropic versions) be given as
8pi_angstroms_squared
? It is an option in theunits_code
enumeration.