glotzerlab / hoomd-blue

Molecular dynamics and Monte Carlo soft matter simulation on GPUs.
http://glotzerlab.engin.umich.edu/hoomd-blue
BSD 3-Clause "New" or "Revised" License
329 stars 127 forks source link

Initialize and support very large systems #561

Closed jglaser closed 2 years ago

jglaser commented 4 years ago

Description

We would like to enable simulations of very large systems (>4B particles), as well as their initialization. In a two stage approach, we should address breaking changes for the user in 3.0, and implement the changes to actually support these systems in subsequent releases.

Proposed solution

Here's a potentially incomplete list of issues that we need to tackle:

Breaking changes (tentatively for 3.0):

Full support for large systems (later releases):

Additional context

@joaander, @jglaser and @InnocentBug discussed this in the context of enabling very large scale simulations of polymeric systems, but these changes will of course be completely general.

Developer

Yes, will contribute. I welcome feedback and additional considerations I forgot to include.

mphoward commented 4 years ago

Sounds interesting, especially the partial snapshots. I'm wondering if there are also potentially subtle issues to watch out for if there are mixed uses of signed and unsigned ints related to the particle tags. Hopefully any cases of this are mostly done on the local indexes rather than tags, though, so that these can be ignored.

joaander commented 4 years ago

Random seeds that use particle tags will need to be updated as well. Philix4x32 takes in 24 bytes of seeds. DPD is already using 24 and we also need to go to 64 bit timestep counters. I haven't looked at other cases where we seed using particle tags, but DPD is probably the worst case as it needs 2 tags.

One solution (for both timestep and tags) would be to store the values in 64-bit quantities but only allow values up to a certain maximum. Say we only allowed up to 40 bits (1 trillion particles / or time steps). We could use fewer bits to identify the unique RNGs in HOOMD, 16 should be enough, and mix the high bits of the tags tag into this seed. We could also do the same with the internal counter, but I'd be concerned about cases where the internal counter is used to generate a large stream. We certainly can't limit it to only 16 bits, but maybe 24 (16 million) is enough? We could also limit the user seed input to less than 32-bits to make room for the additional bits from timestep. Would 16 million user seeds be sufficient?

@jglaser suggested feeding the output of one RNG into another to combine more seeds. I hesitate to go this route without extensive testing as it might create subtle correlations.

mphoward commented 4 years ago

Very good point. I prefer the bit mixing scheme to chaining up RNGs, as the behavior of the bit mixing is probably much easier to debug and define than it would be to try to find correlations between RNGs.

I think it would be OK to limit (1) particle tags (there are no researchers in the world that are going to simulate 1 trillion particles anytime soon) and (2) HOOMD's internal identifiers (we have control over this, and we are not going to need 10^5 of them). I would be extremely hesistant to restrict the counter. The user seed could probably be made less than 32 bits since most users probably choose from their favorite numbers and are not running millions of copies of the same simulation. (It would be better to choose a new starting configuration seeded from system entropy than to just keep changing the seed at that point), but we would need to document this carefully in case people are using the system time as a seed.

asmunder commented 4 years ago

Not sure if you want to include it on this issue, but a related problem we have encountered is that the maximum number of time steps you can use in a hoomd.run() command is 2^31 - 1 (maximum 32 bit integer) which is "only" 2.1B. We have been able to work around it for now just by running a loop where the inner command is hoomd.run(2e9) and the number of loop iterations is whatever we require, so somehow the machinery supports running for longer. Thus it feels like a fix should be simple?

InnocentBug commented 4 years ago

Not sure if you want to include it on this issue, but a related problem we have encountered is that the maximum number of time steps you can use in a hoomd.run() command is 2^31 - 1 (maximum 32 bit integer) which is "only" 2.1B. We have been able to work around it for now just by running a loop where the inner command is hoomd.run(2e9) and the number of loop iterations is whatever we require, so somehow the machinery supports running for longer. Thus it feels like a fix should be simple?

Hey @asmunder, checkout #229 as far as I know, this is going to be covered already with the release of v.3.0.

joaander commented 4 years ago

@jglaser came up with at least one use-case where it is helpful to be able to set the internal counter variable to be able to backtrack in a tree without needing a stack. This works as long as one is careful to only use distributions that sample single values from the generator and should be allowed in the API, though it should not be the default case. Combined with the need to more easily seed the generator with different types of bit mixed seeds and counters, I propose the following API:

Separating the Counter bit mixing code from the location where the independent counter values are assigned will make the code cleaner, easier to understand, and promote code re-use across all the places in HOOMD that use RandomGenerator.

mphoward commented 4 years ago

This sounds like a really nice, clear API. One question of clarification:

  • The RandomGenerator constructor takes Seed [2-byte] and Counter [4-byte] counter arguments.

What do you mean by 2-byte and 4-byte here? It seems like you need way more bytes to represent each of these (and you have up to 24 bytes of input for philox), but I'm probably just being slow this morning.

joaander commented 4 years ago

I said bytes when I meant to say 32-bit words in many places above. Sorry, was just writing down the ideas I had in a brain dump.

Correction:

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] commented 2 years ago

This issue has been automatically closed because it has not had recent activity.