bowman-lab / enspara

Modeling molecular ensembles with scalable data structures and parallel computing
https://enspara.readthedocs.io
GNU General Public License v3.0
33 stars 16 forks source link

Allow for subsampling of featured trajectories #232

Open Justin-J-Miller opened 1 week ago

Justin-J-Miller commented 1 week ago

Clustering on subsampled data is useful for both memory efficiency but also time to compute kmedoids updates. Would be nice to add the option to subsample featurized datasets at the point of clustering.

justinrporter commented 1 week ago

Is this really not available?

It looks like cluster.py at least theoretically has an option for this: https://github.com/bowman-lab/enspara/blob/735a3fb52b61a30268f07375376edfb5859ad99a/enspara/apps/cluster.py#L148C30-L153C1

Justin-J-Miller commented 1 week ago

It's supported for trajectories, but for features/h5 files it is currently disallowed:

https://github.com/bowman-lab/enspara/blob/735a3fb52b61a30268f07375376edfb5859ad99a/enspara/apps/cluster.py#L185C1-L187C67