glotzerlab / gsd

Read and write GSD files for use with HOOMD-blue.
http://gsd.readthedocs.io
BSD 2-Clause "Simplified" License
25 stars 7 forks source link

Extend 'BondGroup' to the molecule/residue length scale. #380

Closed chrisjonesBSU closed 1 month ago

chrisjonesBSU commented 1 month ago

Description

I’m curious about how feasible it would be to extend BondGroup in hoomd.py to add a data structure that includes all bonded particles belonging to the same molecule/residue. So, something like:

frame.residues.group frame.residues.types frame.residues.typeid

Would the fact that the residue groups might have different lengths be an issue?

Proposed solution

A simple example would be a Frame with a methane molecule and an ethane molecule.

>>>frame.residues.types
[“methane”, “ethane”]

>>>frame.residues.group
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11, 12]]

Additional context

My motivation for this is coming from using Hoomd and GSD to simulate organic molecules and polymers. I think this would be very useful for quickly calculating molecular structure properties when using things like custom actions and custom writers in Hoomd, and possibly useful for custom forces as well.

joaander commented 1 month ago

gsd.hoomd.BondData and the corresponding data structures in HOOMD support only small, fixed-length lists. The gsd file format itself supports only dense arrays.

For molecules, I suggest that you perform union-find on the bond topology to very quickly determine the molecules in a system (you can pre-compute if you know it will not change). For residues, or other quantities that are not immediately derivable from existing data, I suggest you use an appropriate file format and data structure to store the data in a way that is specific to your custom action. Or, if you prefer, you can abuse the currently unused diameter per-particle attribute.

I previously made plans to implement custom per-particle data fields (https://github.com/glotzerlab/hoomd-blue/issues/1533) but such a project would be a huge amount of effort and has never been prioritized. Also, such a solution would only be necessary when MPI communication of the quantities is needed. For static data, custom code can compute custom data structures when needed.

chrisjonesBSU commented 1 month ago

I see, thanks for the feedback and advice!