Open abahm opened 4 years ago
This is a great library and I'm already using it lots.
Thanks for the kind words - do let me know of any other issues you run into or functionality you would like.
One function that can inform the user which filetypes a protein can be downloaded in
I thought about this, but am not sure how rigid/fixed this information is. I could add a better error message to downloadpdb
though, suggesting to try downloading the mmCIF file if the PDB download fails. And I could mention this in the docs, saying that the only way to guarantee the file is available is to use mmCIF. Those are on the to-do list.
function that allows the user to read symmetries regardless of the file type
You are correct in saying that PDB headers contain the symmetry information, but this information is also in the mmCIF dictionary. See for example https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/biological-assemblies#Anchor-Biol. You could get the values using something like:
using BioStructures
downloadpdb("3C70", file_format=MMCIF)
d = MMCIFDict("3C70.cif")
d["_pdbx_struct_oper_list.matrix[1][1]"] # etc
If you want your parser to work on all PDB entries that have that information, you could parse from the mmCIF dictionary. The primacy of the mmCIF file is one reason BioStructures doesn't do any PDB header parsing.
I would like to evaluate positions of a protein's atoms with the full expression of symmetry in the complete unit cell or entire crystal.
Thankfully, if you update to BioStructures v0.8.0 you will be able to use recently-added transformation functions to help with this. See the docstrings for Transformation
for more. For example you could read the translation and rotation information from the mmCIF dictionary, create a Transformation
from these then run applytransform!
on your model or applytransform
on the coordinate array to get updated coordinates.
If you do come up with something that works, let me know and I can add an example/explanation to the documentation.
Thank you for all the information! I'll update and begin working with the mmCIF data, and have a look at the transformation functionality. Thanks again for a great library!
This is a great library and I'm already using it lots. I would like to evaluate positions of a protein's atoms with the full expression of symmetry in the complete unit cell or entire crystal. I can't seem to find it in the documentation for this library, nor in the code.
Expected Behavior
I would like to see documented examples of how to retrieve that atoms positions along with all the symmetries of those positions.
Current Behavior
I've written my own basic parser (below), but it only works for PDB files. Molecules in the PDB can often be dowloaded in either PDB format or MMCIF format, however some appear to only be downloadable as mmCIF, e.g. "6PEM". In these cases, downloadpdb throws an exception unless the MMCIF format is specified.
Possible Solution / Implementation
Perhaps two functions be added to the library. One function that can inform the user which filetypes a protein can be downloaded in, and another function that allows the user to read symmetries regardless of the file type. However there may be more natural solutions that integrate better with the BioStructures.Model (e.g. adding a field for parsed symmetries). I would be happy with anything that supported both formats and allowed me to ultimately get all the atoms in the unit cell or entire crystal.
By way of example of the second function, the code that I implemented as a workaround follows. It assumes that you have already downloaded a PDB file to a pdb_cache_directory, and it only works with .pdb format files.
Your Environment