biotite-dev / biotite

A comprehensive library for computational molecular biology
https://www.biotite-python.org
BSD 3-Clause "New" or "Revised" License
671 stars 101 forks source link

The format of the mmCIF file after changing the block and saving is broken #658

Closed dargen3 closed 1 month ago

dargen3 commented 2 months ago

Hello,

'd like to report a probable error. When I change a block in the mmCIF file and save it, a format break occurs. Items are written directly after the # character instead of on the next line. For example, instead of

data_9BFL
# 
_entry.id   9BFL 
# 
_audit_conform.dict_name       mmcif_pdbx.dic 
_audit_conform.dict_version    5.393 
_audit_conform.dict_location   http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic 
# 
...

is written to the file

data_9BFL
#
_entry.id   9BFL 
# _audit_conform.dict_name       mmcif_pdbx.dic 
_audit_conform.dict_version    5.393
_audit_conform.dict_location   http://mmcif.pdb.org/dictionaries/ascii/mmcif_pdbx.dic 
# loop_
_database_2.database_id 
...

Code to reproduce (Used PDB can be downloaded from here:

import biotite.structure.io.pdbx as mmCIF
import biotite.structure as structure
import hydride

mmCIF_file = mmCIF.CIFFile.read("9bfl.cif") 
protein = mmCIF.get_structure(mmCIF_file,
                              model=1,
                              extra_fields=["charge"],
                              include_bonds=True)
protein = protein[protein.element != "H"]  # remove all hydrogens
protein_with_hydrogens, _ = hydride.add_hydrogen(protein)
mmCIF_file_with_hydrogens = mmCIF.CIFFile()
mmCIF.set_structure(mmCIF_file_with_hydrogens, protein_with_hydrogens)
mmCIF_file.block["atom_site"] = mmCIF_file_with_hydrogens.block["atom_site"]
mmCIF_file.write("9bfl_protonated.cif")
padix-key commented 2 months ago

Thanks for the report! I will look into it tomorrow

padix-key commented 1 month ago

659 will fix this issue.

Unrelated to this bug, in general I recommend setting the structure directly to file to be written, as set_structure() does not only set atom_site, but also struct_conn (and chem_comp_bond if include_bonds=True).

import biotite.structure.io.pdbx as mmCIF
import biotite.structure as structure
import hydride

mmCIF_file = mmCIF.CIFFile.read("9bfl.cif") 
protein = mmCIF.get_structure(mmCIF_file,
                              model=1,
                              extra_fields=["charge"],
                              include_bonds=True)
protein = protein[protein.element != "H"]  # remove all hydrogens
protein_with_hydrogens, _ = hydride.add_hydrogen(protein)
mmCIF.set_structure(mmCIF_file, protein_with_hydrogens)
mmCIF_file.write("9bfl_protonated.cif")
dargen3 commented 1 month ago

Thank you for the quick fix. And thanks for the tips. Sounds good!