Open AngryBear2 opened 1 year ago
Legion has two main sets of partitions, which are defined here:
The primary partition is just an equal partition:
The secondary partitions are more complicated and encode, essentially, the dependence patterns:
As a general rule, data in Task Bench is "fake" in that it does not encode "real" data, and the data is not consumed by any of the kernels (whether compute or memory or whatever). The data does contain information to encode where it's coming from so we can check it for correctness. But to a first approximation, you should completely separate in your mind the partitioning (which is related to the dependence pattern) and the kernels (which actually execute, but ignore all data).
Kernels are never "partitioned", they just do what they're told. So if you run a compute kernel and tell it to execute N
iterations, it will always execute N
iterations, regardless of the size and shape of the graph.
Absolutely nothing in Task Bench is sensitive to the number of nodes. The graph is configured explicitly (via command line paramaters) and it is up to each implementation to spread that graph out as best it sees fit. But the Task Bench core (used by each implementation) is oblivious to how many nodes/cores there are or how things are parallelized. You could just as easily make a sequential version of Task Bench that executes the same thing.
Hope that helps.
Thank you. According to your words, I understand that each node in the figure is executing the same kernel, not all nodes are executing the same kernel. How do I judge the data transfer time of the dependent nodes?
There is a summary printed at the end which should include a bandwidth figure; but you may need to pass in the number of nodes (-nodes N
) in order for it to accurately calculate the intra-node vs inter-node bandwidth.
Hello, I have some understanding of task-bench at present, but there are still many things that are not very clear, here I would like to ask you:
It seems that type-type is the type that generates the dependencies of the task graph, and kernel is the part to be executed. For example, if I want to use stencil to execute compute_kernel, how is the generated legion code partitioned? What about data dependencies on multiple nodes? Does each node split the compute_kernel into many parts, or does each node execute the same compute_kernel?
I am not familiar with the execution process of memory_kernel on legion, so I hope you can consult me.