choderalab / modelforge

Infrastructure to implement and train NNPs
https://modelforge.readthedocs.io/en/latest/
MIT License
11 stars 4 forks source link

Memory utilization in pair list generation #134

Closed chrisiacovella closed 4 months ago

chrisiacovella commented 4 months ago

In the current implementation of identifying the pairs of atoms in the same molecule, we end up creating some very large 1-d tensors when identifying which pairs exist in the same molecule (see code below). This creates two 1d tensor of length n*(n-1), where n is the total number of atoms in all the molecules. As the dataset size grows, this ends up consuming a large amount of memory. When trying to use SPICE114 on either my laptop (8GB gpu) or the 2080s (12 gb gpu), the code crashes because it can't allocate enough memory (obviously this will run fine on a gpu with more memory, like the A100s, but the wait time is substantially longer on those which right now is a limiting factor).

https://github.com/choderalab/modelforge/blob/7fde7643082f82e4f095cdd9c89407aff0daa7e3/modelforge/potential/models.py#L100C8-L107C54

A simple solution to this is just to put this in a loop, where we generate tensors of length n-1, n times (using cat to accumulate the interacting pairs). Execution time is basically the same, but memory utilization drops substantially, by a factor of n obviously.

Granted there are other ways to improve this we could consider (e.g., if we know the maximum molecule size, we would only need to generate a tensor of length the maximum molecule size each time, instead of n). This might be necessary if the datasets grow substantially in size, but I don't think it is necessary right now.

I'll note if we want only unique pairs we are calling triu_indices to get the unique pairs. As system size grows, this too may be problematic. Of the NNPs implemented, it looks like tensornet and ANI use unique pairs. We don't need to use triu_indices and can instead just do a second masking where index_i < index_j to get unique pairs when needed.

I will post a PR with this simple change.

chrisiacovella commented 4 months ago

I've merged the PR, that also included a fix to allow proper pip installation of the datafiles.