Multi-Model `b_factor` currently not available

BradyAJohnston commented 1 year ago

While previously using atomium for parsing, b_factor on a per-model basis was available.

With the switch to Biotite, this is currently not available and b_factor is now taken from the first model and used for all other models as well.

The issue is raised here and a fix will hopefully be coming when new functionality is added Biotite.

BradyAJohnston commented 1 year ago

When importing via MDAnalysis, b_factor is also not frame-specific but is instead for topology and is the same for each timestep.

As noted in this comment https://github.com/MDAnalysis/mdanalysis/issues/3825#issuecomment-1244024989 you can get timestep-specific data from the occupancy for the specific timestep.

import MDAnalysis as mda

u = mda.Universe('traj.pdb')

for ts in u.trajectory:
    print(ts.data['occupancy'])

Should be able to add this to the MDAnalysis import for each timestep creation.

BradyAJohnston commented 1 year ago

Works well. Added here: https://github.com/BradyAJohnston/MolecularNodes/commit/26a8c41403fc2da3b377bf9c22d32eb7e7746d7e

Tested with the below data:

TITLE     MDANALYSIS FRAMES FROM 0, STEP 1: Created by PDBWriter
CRYST1    1.000    1.000    1.000  90.00  90.00  90.00 P 1           1
REMARK     285 UNITARY VALUES FOR THE UNIT CELL AUTOMATICALLY SET
REMARK     285 BY MDANALYSIS PDBWRITER BECAUSE UNIT CELL INFORMATION
REMARK     285 WAS MISSING.
REMARK     285 PROTEIN DATA BANK CONVENTIONS REQUIRE THAT
REMARK     285 CRYST1 RECORD IS INCLUDED, BUT THE VALUES ON
REMARK     285 THIS RECORD ARE MEANINGLESS.
MODEL        1
ATOM      1  O   SOL X   1       0.000   0.000   0.000  0.17  0.17      SOL  O
ATOM      2  H1  SOL X   1       0.000   0.000   0.000  0.17  0.17      SOL  H
ATOM      3  H2  SOL X   1       0.000   0.000   0.000  0.17  0.17      SOL  H
ENDMDL
MODEL        2
ATOM      1  O   SOL X   1       0.000   0.000   0.000  0.33  0.33      SOL  O
ATOM      2  H1  SOL X   1       0.000   0.000   0.000  0.33  0.33      SOL  H
ATOM      3  H2  SOL X   1       0.959   0.959   0.959  0.33  0.33      SOL  H
ENDMDL
MODEL        3
ATOM      1  O   SOL X   1       0.959   0.959  -0.027  0.50  0.50      SOL  O
ATOM      2  H1  SOL X   1      -0.027  -0.027  -0.027  0.50  0.50      SOL  H
ATOM      3  H2  SOL X   1      -0.027   0.032   0.032  0.50  0.50      SOL  H
ENDMDL
MODEL        4
ATOM      1  O   SOL X   1       0.032   0.032   0.032  0.67  0.67      SOL  O
ATOM      2  H1  SOL X   1      -0.280  -0.280  -0.280  0.67  0.67      SOL  H
ATOM      3  H2  SOL X   1      -0.280  -0.280  -0.588  0.67  0.67      SOL  H
ENDMDL
MODEL        5
ATOM      1  O   SOL X   1      -0.588  -0.588  -0.588  0.83  0.83      SOL  O
ATOM      2  H1  SOL X   1      -0.588   0.706   0.706  0.83  0.83      SOL  H
ATOM      3  H2  SOL X   1       0.706   0.706   0.706  0.83  0.83      SOL  H
ENDMDL
END

Able to get interpolating occupancy data with this setup:

Potential to add a field to the original Animate Frames node rather than do duplicate interpolation computations

BradyAJohnston commented 1 year ago

Was also able to get it working via adding the following code for biotite import:

def pdb_get_b_factors(file):
    b_factors = []
    for model in range(file.get_model_count()):
        atoms = file.get_structure(model = model + 1, extra_fields = ['b_factor'])
        b_factors.append(atoms.b_factor)
    return b_factors

Which was suggested in the biotite issue. Was able to get import working properly. Will try and implement more robustly.

BradyAJohnston / MolecularNodes

Multi-Model `b_factor` currently not available #128