Unify most processor kinds

muraj commented 2 weeks ago

All of our processors have different "kinds" that segregate their capabilities and features. For example, often times we want to associate a GPU with a python processor, and leverage all stream management within a python task. Another example is when clients want to configure the available processors based on the machine topology on behalf of the user. This has been partially implemented via the configuration API, but is based on the command line argument interface in which Realm still manages the construction of the processors and their affinities, which is not a rich enough interface to properly describe what is needed.

Instead, we'd like to propose an interface where applications can dynamically create processors with certain properties and features enabled. Naming and actual syntactical language subject to change, the new interface for creating processors would look something like the following:

using namespace Realm;
int main() {
  Runtime r;
  r.init();
  // r.get_available_nodes(local=true);
  // r.get_nodeid();
  r.get_available_core_layout(&core_layout, nodeid);  // TBD
  cuda_mod = r.get_module_specific<CudaModule>();
  cuda_mod->get_available_gpus(&gpus, &num_gpus);
  for (size_t g = 0; g < num_gpus; g++) {
   cuda_mod->get_gpu_info(gpus[g], &gpu_info);
  }
  // Process the gpu and core information to e.g. find core(s) closest to the
  // gpu to use for the processor and fill up create_processor_info structure with the needed information.

  if (r.get_module_specific<PythonModule>() != nullptr) {
    create_processor_info.python = true;
  }

  bool ok = r.create_processor(&gpu_proc, &create_processor_info);

  r.refresh_machine_model(); // Distributes all the newly created processors and their
                             // affinities to all the ranks, allowing remote queries to work
                             // Possibly return an event here to wait on?

  return 0;
}

In order to maintain compatibility with the interface we already have, these "custom" processors will probably have a new "USER_KIND" or something, and a new set of queries to reverse engineer the processor for applications can be provided, e.g.:

if (p.kind() == PROC_USER_KIND) {
  p.get_feature_flags(&features);
  if (features.enables_cuda) { // Naming TBD
    cuda_mod->get_cuda_info(p, &cuda_info);
    // Contains associated gpu, context, etc
  }
  if (features.enables_python) {
    py_mod->get_python_info(p, &py_info);
    // Maybe retrieve the python interpreter object, etc.
  }
}

The first step in this is to internally remove all the derived classes of LocalTaskProcessor and utilize the ContextManager for when tasks are about to be start / finish executing and push most of the logic of how to create these processors out and into the caller instead of a derived object. This will allow us to componentize our current processors and verify the logic for creating these processors dynamically will work with our current test suite.

lightsighter commented 2 weeks ago

Can you provide a prototype for what the create_processor_info struct will look like?

Also, I think we should show some code of what machine model queries will look like with the new interface.

This is also a duplicate of #680

muraj commented 2 weeks ago

@lightsighter I don't have a complete story of the create_processor_info structure as of yet, but here's what I was thinking, it's very reminiscent of DirectX and Vulkan. Keep in mind that we can build whatever C++ wrappers we want on top of this, but I'm open to comments / suggestions:

namespace Realm {
struct CreateProcessorInfo {
  ProcessorInfoType type = CREATE_PROCESSOR_INFO;  // To help with versioning
  void *pNext = nullptr;
  size_t *coreids = nullptr;
  size_t num_cores = 0;
}; }

namespace Realm::Cuda {
struct CreateCudaProcessorInfo {
  ProcessorInfoType type = CREATE_CUDA_PROCESSOR_INFO;  // To help with versioning
  void *pNext = nullptr;
  CUuuid gpuid; // maybe some more fields here.
}; }

namespace Realm::Python {
struct CreatePythonProcessorInfo {
  ProcessorInfoType = CREATE_PYTHON_PROCESSOR_INFO;
  void *pNext = nullptr;
  // Python specific processor stuffs
}; }

// e.g.
CreateProcessorInfo create_processor_info;
CreateCudaProcessorInfo cuda_processor_info;
CreatePythonProcessorInfo python_processor_info;

std::vector<size_t> allcores;
size_t num_cores = 0;
// all_cores, numa_cores, etc.
r.get_all_cores(nullptr, &num_cores);
allcores.resize(num_cores);
r.get_all_cores(allcores.data(), &num_cores);

create_processor_info.pNext = &cuda_processor_info;
create_processor_info.coreids = allcores.data();
create_processor_info.num_cores = allcores.size();

cuda_processor_info.gpu = gpu_infos.front().uuid;
cuda_processor_info.pNext = &python_processor_info;

Processor p;
err = r.create_processor(&p, &create_processor_info);

muraj commented 2 weeks ago

Thinking about it, here's the C++ wrapper we can make on top of this:


Processor p = ProcessorBuilder()
                .set_cores(all_cores)
                .set_gpu(gpu_infos.front().uuid);

This is fairly easily built as a header-only class.

muraj commented 2 weeks ago

Also, I think we should show some code of what machine model queries will look like with the new interface.

For this, I think doing a simple extension of the ProcessorQuery like the following would be enough:

ProcessorQuery::Features features;
features.has_cuda = true;
ProcessorQuery pq = ProcessorQuery().has_features(features);

This would work with all processor kinds, so the original TOC_PROC would be returned in this query as well.

StanfordLegion / legion

Unify most processor kinds #1747