BioJulia / Bio.jl

[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
http://biojulia.dev
MIT License
261 stars 65 forks source link

PDBx/mmCIF and PDBML/XML support. #439

Closed bicycle1885 closed 4 years ago

bicycle1885 commented 7 years ago

Since the Protein Data Bank (PDB) has switched the standard file format from PDB to mmCIF, it is desirable to support PDBx/mmCIF (and PDBML/XML). I couldn't find the formal description of the format but it seems to be simple judging from some example. If it is flat (it seems to be so), we can use Automa.jl to generate a parser for it. PDBML/XML is XML so I think it's easier to support it using EzXML.jl.

bicycle1885 commented 7 years ago

@jgreener64, any thoughts? Is PDBx/mmCIF enough popular to support in BioJulia? I have no idea.

jgreener64 commented 7 years ago

You're right that the PDB standard format has switched to mmCIF. In my experience PDB is still the preferred format for people in the field, though that could be due to slow uptake of the new format.

Writing a mmCIF parser for BioJulia has been on my wish list for a while but realistically I won't have time in the near future - obviously I would be keen to talk design and review code if someone else wanted to do it.

It's also worth mentioning the MMTF at this point, a new binary format supported by the PDB. I started a Julia encoder/decoder for it but didn't get round to finishing it.

bicycle1885 commented 7 years ago

Thank you, @jgreener64. I didn't know MMTF. It seems to be promising since text-based file formats are, yes, slow.

Anyway, I will take a look mmCIF further when I have time.

jgreener64 commented 6 years ago

For the record, a mmCIF reader/writer is now implemented in https://github.com/BioJulia/BioStructures.jl.

jgreener64 commented 4 years ago

mmCIF is implemented in BioStructures.jl, MMTF is in the works and PDBML is a "someday" feature. Any discussion on this can be continued at BioStructures.jl.

TransGirlCodes commented 4 years ago

Thanks @jgreener64!