a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.01k stars 126 forks source link

Use MOL2 file in graphein.protein.graphs.construct_graph #259

Open KevinCrp opened 1 year ago

KevinCrp commented 1 year ago

Files used to construct protein graphs must be in PDB format. Whereas, molecular graphs may also be constructed from MOL2 or SDF.

Is it possible to add a protein graph constructor from the MOL2 file? I don't understand why the protein graphs are limited to PDB files.

I have tested converting MOL2 to PDB to construct a protein graph, but the conversion does not always work well.

a-r-j commented 1 year ago

Hi @KevinCrp

PDBs are the primary format for protein graphs as this is what is typically used in the community. Molecules are comparatively simpler to parse - there are a lot of protein-specific fields recorded in PDB files that I don't believe are explicit in Mol2 files.

For example, chains and residue types have to be inferred from atom types and connectivity. I believe if, for example, you had an ALA with an unresolved/missing Cb atom it would not be possible to distinguish this from a GLY. Furthermore, Mol2 files don't contain bfactors, occupancy etc.

This is not to say it's impossible; it's certainly doable. However, I don't have bandwidth to implement this myself. If you want to make a PR to implement this I'd be more than happy to support you.