biotite-dev / biotite

A comprehensive library for computational molecular biology
https://www.biotite-python.org
BSD 3-Clause "New" or "Revised" License
682 stars 102 forks source link

Loading old npz files #662

Closed ljmartin closed 2 months ago

ljmartin commented 2 months ago

Hi biotite,

I made the unfortunate mistake of storing a bunch of complexes as npz files, and also not checking the deprecated functions when I updated the library!

Firstly: Is there a sensible way to load these into biotite structure objects using the latest library? I can see that an npz file contains all the relevant ingredients of an AtomArray:

dat = np.load('./myfile.npz')
print(dat.files)

and can turn that into an AtomArray just fine:

import numpy as np
dat = np.load('./myfile.npz')
for name in dat.files:
    print(name)
n = dat['coord'].shape[0]
arr = structure.AtomArray(length=n)

arr.coord = dat['coord']
arr.box = dat['box']
arr.set_annotation('chain_id', dat['chain_id'])
arr.set_annotation('res_id', dat['res_id'])
arr.set_annotation('ins_code', dat['ins_code'])
arr.set_annotation('res_name', dat['res_name'])
arr.set_annotation('hetero', dat['hetero'])
arr.set_annotation('atom_name', dat['atom_name'])
arr.set_annotation('element', dat['element'])
arr.set_annotation('resSerial', dat['resSerial'])

and can save too (note - my data has no bonds):

np.savez(
    'temp.npz',
    **{
        'coord' : arr.coord,
        'res_id' : arr.res_id,
        'ins_code' : arr.ins_code,
        'res_name' : arr.res_name,
        'hetero' : arr.hetero,
        'atom_name' : arr.atom_name,
        'element' : arr.element,
        'resSerial' : arr.resSerial

    }
)

Secondly - do you have a replacement for npz that you use? I'll use something else if it's best practice.

Thank you!

padix-key commented 2 months ago

Your approach looks correct. If you like you can also use the (now removed) code from structure.io.npz:

https://github.com/biotite-dev/biotite/blob/270f2d6cd372c0d504e5960407ff082c4930893f/src/biotite/structure/io/npz/file.py#L100-L133

As direct replacement I would propose using pickle, if you want the fastest reading/writing speed and do not care about interoperability with other software or future backwards-incompatible Biotite releases. If you are able to spend a bit more computation time, I would rather propose bcif: It is small and reading/writing is still highly optimized, especially if you do not store bonds.

ljmartin commented 2 months ago

thank you!