OpenFreeEnergy / gufe

grand unified free energy by OpenFE
https://gufe.readthedocs.io
MIT License
29 stars 8 forks source link

rename ProteinComponent to PDBComponent? #139

Open richardjgowers opened 1 year ago

IAlibay commented 1 year ago

moving away from protein would be good, but PDB might be too narrow if we ever offer other ways to load things?

richardjgowers commented 1 year ago

@IAlibay the thinking is that currently we're making a lot of PDB specific assumptions about the content of the underlying model (resnames, resnums + icodes, chainids) so that it's not a general protein model but specifically a PDB-format view of a protein model.

What other ways to load things might we add (that would include the above special fields)?

IAlibay commented 1 year ago

don't we mostly make these assumptions on load? I feel like elsewhere we just assume it's something that can be digested by rdkit and eventually passed along to PDBFile/Modeller?

My main view here re: naming here is 1) the intent of this component is that it's a polymer that can get parametrised using off the shelf additive force fields, 2) technically there's nothing that should stop you from passing in a customised / self built rdkit molecule to this component, 3) pretty sure something like a gro file or more programmatically an MDA Universe would include all the necessary information you'd need - in the long run it'd be great to just go "here are attributes you need, make this an rdkit molecule and it'll work"

richardjgowers commented 1 year ago

So one key difference between SmallMoleculeComponent and ProteinComponent is that the underlying rdkit molecule in ProteinComponent is expected to have the MonomerInfo() attribute (which holds resname etc) populated. This for example then populates the Hierarchy system in an OpenFF Molecule/Topology

IAlibay commented 1 year ago

I think I'd vote for PolymerComponent here, or something like that, and then make it clear that the rdkit molecule needs that information. Maybe I'm just overthinking the cool possibility of creating arbitrary polymers without ever seeing an input file though.

richardjgowers commented 1 year ago

@IAlibay We're constructing an rdkit molecule which specifically populates the MonomerInfo struct, which is then handled by openff tk as a biopolymer because of the presence of this extra info. This is arguably what differentiates the mols in ProteinComponent to SmallMoleculeComponent. And that MonomerInfo struct is pretty much just PDB specific crud that isn't generalisable to polymers.

IAlibay commented 1 year ago

I see what you mean there, but what if you went the other way round, i.e. you create a component from an OpenFF polymer molecule? (created through some arbitrary means that has this information)

IAlibay commented 1 year ago

My main point here is - are we naming for input or for what the object represents. Both are ok, but then we should probably consider having the same naming scheme across the board, otherwise it's confusing for users.