Closed BradyAJohnston closed 1 year ago
Thanks for bringing this up. I fear this is a limitation in the core of Biotite: AtomArray
and AtomArrayStack
objects handle the B-factor as optional annotation array. Therefore, by design, the b_factor
is equal for all models which is true for all annotation arrays.
I see two future prospects to fix this:
get_b_factor()
method in PDBFile
, analogous to get_coord()
. This would allow relatively fast parsing of B-factors even over multiple models. However, the NumPy array of B-factors would not be integrated into the AtomArray
itself but must be shipped separately in this variant.b_factor
as 2D array to AtomArray
and AtomArrayStack
. However, this would require changes in all file interfaces and some other functions. Furthermore, some of these changes would break backwards compatibility (which is not a hard barrier, as Biotite is still in beta).A solution to this, that is equivalent to variant 1 but with lower efficiency, would be parsing each model separately and saving the array of B-factors:
file = pdb.PDBFile.read(
"/home/kunzmann/data/coding/biotite/tests/structure/data/1l2y.pdb"
)
b_factor = []
for model in range(file.get_model_count()):
atoms = file.get_structure(model=model+1, extra_fields=["b_factor"])
b_factor.append(atoms.b_factor)
b_factor = np.stack(b_factor)
I tend to variant 1, as the B-factor is originally a property of crystal structures (although it has been exploited by a range of programs to store other data as well). As crystal structures in almost all cases have only a single model and multi-model (e.g. NMR) structures have no meaningful B-factor, there is little use for different B-factors along multiple models. Therefore, I think handling a 2D array of B-factors separate from the AtomArrayStack
should be sufficient in such special cases. Alternatively, a list of AtomArray
objects for each model could be used instead at the cost of some performance and convenience.
What is your opinion on this? Would using a separate 2D array work for your use case of atom coloring on a per-model basis?
In my experience the only time a multi-model PDB or mmCIF contains differing b_factor
values is when people are hijacking it for storing arbitrary data computed from something else, so it's understandably not an integrated part of the AtomArray.
For my own purposes, handling a separate 2D array would be totally fin. I can index it when creating the models, and I currently only need to actually have access to it one time really. The code solution you posted as well would also be useful and I could certainly implement it in the mean time if you were to plan on adding the get_b_factor()
method later on.
When importing a multi-model
pdb
file, theb_factor
is taken from the first model, for all subsequent models.Example pdb:
Each frame outputs
array([0.17, 0.17, 0.17])
from the first model.