aiidateam / aiida-quantumespresso-hp

MIT License
3 stars 0 forks source link

Add more control over the number of submitted `WorkChains` #52

Closed t-reents closed 9 months ago

t-reents commented 10 months ago

It would be nice if the user would have more control over the number of WorkChains that are submitted at a time.

All of the sub-WorkChains, e.g. in SelfConsistentHubbardWorkChain and HpParallelizeQpointsWorkChain, are submitted at once, at the moment. This might lead to many submissions at the same time, which might be problematic for some clusters, or in general, sum up to a big amount of WorkChains in case one runs e.g. multiple HpParallelizeAtomsWorkChains. I was figuring that it might be useful to introduce something like submission in batches, e.g. submitting only N sub-WorkChains at a time, which would allow the user to use the advantages of the parallel submission to a certain extend but still control the number of parallel WorkChains.

I already prepared a draft locally, the PR will follow soon. I would simply add a new input specifying the number of WorkChains per q-point submitted a time for the HpParallelizeAtomsWorkChain and HpParallelizeQpointsWorkChain, given that the parallelization over q-points is enabled.

The SelfConsistentHubbardWorkChain would also get a new input to specify the overall number of WorkChains submitted at a time (HpParallelizeAtomsWorkChains and HpParallelizeQpointsWorkChains ). I thought about the following logic so far:

Assuming we set the new input to 5 and our structure contains 3 disturbed atoms. Those 5 allowed WorkChains would be distributed as follows: [2, 2, 1]. This list refers to the number of q-point-WorkChains per atom that are submitted at a time. Each HpParallelizeQpointsWorkChain will continue submitting new q-point batches once the previous batch has finished.

Another example: Setting the input again to 5, but this time, we have 6 perturbed atoms. This would result in the submission of [1, 1, 1, 1, 1] q-point-WorkChains per atom. Once the HpParallelizeQpointsWorkChains are done, the last HpParallelizeAtomsWorkChain will be submitted but this time with 5 possible q-point-WorkChains, since we ensure that the others are done at this stage.

@bastonero In case you already have comments regarding this logic, please feel free to share them here. Otherwise, we can discuss once the PR is there

bastonero commented 10 months ago

Hi @t-reents, thanks for re-posting the issue more extensively. I don't quite understand what do you mean [1, 1, 1, 1, 1, 1] per atom?

Probably, in order not to complicate too much the logic, we can define a maximum number of *, instead of the concept of batches. So we would have

  1. Number of maximum concurrent perturbed atoms (e.g. max_concurrent_atoms)
  2. Number of maximum concurrent qpoints per perturbed atom (e.g. max_concurrent_qpoints)

So, say we define max_concurrent_atoms=2 and max_concurrent_qpoints=3. Let's say our structure has 7 atoms to perturbed, and, for simplicity, 8 qpoints for each atom (remember that in principle this number may change among different atoms within the same structure). We would then have that:

  1. The HpParallelizeAtoms launches 2 HpParallelizeQpoints
  2. Each HpParallelizaQpoints launches 3 HpBase
  3. When 2. is done, launch other 3 HpBase
  4. When 3. is done, launch the last 2 HpBase
  5. The HpParallelizeAtoms launches the following 2 HpParallelizeQpoints for the next 2 atoms.
  6. Continue the previous logic, till the last HpParallelizaQpoints for the last (7th) atom.

Is this what you had in mind?

A further possibility could be to have also single HpBase that runs more than 1 qpoint at a time (e.g. by setting start_qpoint=i and last_qpoint=i+n). This may be useful for smaller structures having lots of qpoints. On the other hand, since hp.x doesn't have restart options, I think eventually it's less appealing this approach (see also #34).

t-reents commented 10 months ago

Yes, this is exactly the logic that I implemented in my first draft. With the term "batches", I was basically referring to what you described in your outline. E.g. the second point, I also meant that instead of submitting all the HpBase at once, we submit a HpBase batch of size 3. So basically just a different wording.

Instead of specifying the number of concurrent atoms and qpoints separately, I came up with the idea of specifying only the total number. Based on the number of atoms/sites that is determined after the initialization, the number of HpBase for the qpoints is distributed accordingly. The length of the list corresponds to the number of concurrent HpParallelizeQpoints and the elements to the number of concurrent HpBase in each of them. The function tries to distribute the number of HpBase uniformly over the different HpParallelizeQpoints.

I don't quite understand what do you mean [1, 1, 1, 1, 1, 1] per atom?

The list encoded the following outline:

  1. The HpParallelizeAtoms launches 5 HpParallelizeQpoints (length of list)
  2. Each HpParallelizaQpoints launches 1 HpBase (the elements of the list)
  3. Same steps as in your outline

The first example "[2, 2, 1]" represented accordingly:

  1. The HpParallelizeAtoms launches 3 HpParallelizeQpoints (length of list)
  2. The wirst two HpParallelizaQpoints launch 2 HpBase and the third one only 1 (the elements of the list)
  3. Same steps as in your outline

This was just to explain what I meant with this list notation. But I would also agree on the separate inputs, so up to you

bastonero commented 10 months ago

Interesting approach. So you would define at higher level (HpWorkChain) a max_concurrent_base_workchains, and then define accondingly the "sub-maximum number of concurrent workchains"?

What I like is that it will indeed at maximum run a certain number of concurrent HpBaseWorkChains, although it won't be guaranteed that it will be the extact number of HpBaseWorkChain running concurrently.

One thing to consider is the fact that, say one atom has lots of qpoints, then all the other remaining workchains will have to wait. In the AiiDA workflows, AiiDA will have to wait all the submitted processes before continuing in the outline. Considering the amount of qpoints differs slightly among different atoms, I guess it is acceptable. We can also start trying it out, and in any case this will be optional.

t-reents commented 10 months ago

Yes exactly, this was the idea and I agree with your comments