Closed alexaiken closed 8 months ago
Added an item for describing/consolidating/updating instructions for how to build and how to configure (via the command line or perhaps the programmatic interface envisioned in #600) things at runtime.
Is there a plan for how this integrates with the website? Pandoc can read Latex, but that doesn't mean you get pretty HTML out the other side.
I haven't thought about it. The first goal is to actually have a manual! If there were something better than Latex for displaying in different environments we could likely move to that.
On Tue, Oct 8, 2019 at 11:32 AM Elliott Slaughter notifications@github.com wrote:
Is there a plan for how this integrates with the website? Pandoc https://pandoc.org/ can read Latex, but that doesn't mean you get pretty HTML out the other side.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/StanfordLegion/legion/issues/633?email_source=notifications&email_token=ABBIV75LFADP5HW6TCPTBETQNTG3LA5CNFSM4I6I23MKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAVFDCA#issuecomment-539644296, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBIV73ACRNVTVVNSN3HVEDQNTG3LANCNFSM4I6I23MA .
The most fundamental question that I have is "What are the semantics of control replication?".
For example, if one inner task performs an index launch with two inner tasks, each of which performs an index launch of two leaf tasks, there should be 7 tasks total (1+2+4). However, if the (say) two shards of the top-level task "each" perform an index launch of two leaf tasks, my understanding is that there are only 4 executions total (1*2+2). This implies that all sorts of actions taken in "the" top-level task are actually collective, raising questions about exactly how identical the control flow must be in each shard. Does each shard have access to the future-wrapped results of tasks that were actually handled by some other shard? If there were a set of single tasks to launch whose arguments were expensive to compute, could the various shards somehow indicate that they were launching "private" single tasks that should not be collective?
The semantics of control replication are that you get the same behavior whether you run with it or not.
However, if the (say) two shards of the top-level task "each" perform an index launch of two leaf tasks, my understanding is that there are only 4 executions total (1*2+2).
This is wrong. With control replication you would still get two inner sub-tasks, each of which run two leaf tasks (1+2+4).
raising questions about exactly how identical the control flow must be in each shard.
Every shard must make the same order of calls into the Legion runtime with the same arguments. It's up to the user to guarantee that invariant if they mark that a task variant is "replicable". We don't care how you ensure this invariant, but if you violate it after promising us that a task variant is "replicable" then you will get undefined behavior.
Does each shard have access to the future-wrapped results of tasks that were actually handled by some other shard?
Yes
If there were a set of single tasks to launch whose arguments were expensive to compute, could the various shards somehow indicate that they were launching "private" single tasks that should not be collective?
The mapper gets to choose whether individual leaf tasks are performed once and have the future result broadcasted to consumers or replicated many times. This is true regardless of whether you are running with control replication or not. The mapper controls whether replication is "collective" (in the sense of control replication) or not.
Re: the last point, I think the issue is the computation of arguments to tasks, rather than tasks themselves.
The way I'd personally deal with this is I'd put the computation of the argument itself inside a task. That way it can be distributed around the machine, just like any other task. If computing the argument is not expensive enough to make this worth it, it's probably not worth worrying about in control replication either.
The semantics of control replication are that you get the same behavior whether you run with it or not.
Well, that's true so long as your code has no visible side effects (other than calls into Legion); code that, say, writes results to a file with MPI I/O has to be aware of the fact that certain tasks comprise a set of concurrent copies.
However, if the (say) two shards of the top-level task "each" perform an index launch of two leaf tasks, my understanding is that there are only 4 executions total (1*2+2).
This is wrong. With control replication you would still get two inner sub-tasks, each of which run two leaf tasks (1+2+4).
Sorry, I didn't make the situation parallel: there's only the top-level task and one index launch in this example.
If there were a set of single tasks to launch whose arguments were expensive to compute, could the various shards somehow indicate that they were launching "private" single tasks that should not be collective?
The mapper gets to choose whether individual leaf tasks are performed once and have the future result broadcasted to consumers or replicated many times. This is true regardless of whether you are running with control replication or not. The mapper controls whether replication is "collective" (in the sense of control replication) or not.
Well, according to the "same sequence" rule what I described (in the useful case where the single tasks were distinct) would be undefined behavior (which is a fair rule). The idea that the leaf tasks might be executed redundantly (at a savings in communication) is very interesting: do they have to be marked replicable for that to happen like the top-level task does? If not, are there any mapper defaults that replicate single (non-top-level) tasks?
Well, that's true so long as your code has no visible side effects (other than calls into Legion); code that, say, writes results to a file with MPI I/O has to be aware of the fact that certain tasks comprise a set of concurrent copies.
All Legion facilities including side effects on POSIX and HDF5 files as well prints and logging calls done through Legion APIs will behave as though it is a single logical task. If you go outside of the Legion programming model and runtime API then that is your problem to deal with. You asserted the task could be replicated so you have to deal with it.
Sorry, I didn't make the situation parallel: there's only the top-level task and one index launch in this example.
Same principle applies: each point task in the index space launch will run exactly once unless explicitly selected to be replicated by the mapper.
The idea that the leaf tasks might be executed redundantly (at a savings in communication) is very interesting: do they have to be marked replicable for that to happen like the top-level task does?
In order to be just normally replicated (but not control replicated), you just have to mark the tasks 'idempotent' which means that they can be run multiple times and they will not have any side effects beyond their already stated effects on regions.
If not, are there any mapper defaults that replicate single (non-top-level) tasks?
To the best of my knowledge, there are no mappers that attempt to do this at the moment. Although certainly the Regent mapper should be doing this for all of it's scalar value computations when running with control replication (reminder to @elliottslaughter).
I'm going to declare this done, and any further issues can go into https://github.com/StanfordLegion/legion-manual
Finish a first draft of the Legion manual and reference. Chapters still to go for a first version:
Additional planned chapters for the complete version