BlueBrain / hwloc

Mirror of Portable Hardware Locality, with new features in the bbp branch.
Other
2 stars 0 forks source link

Wrong CPUSET for partial allocation. #5

Open marwan-abdellah opened 11 years ago

marwan-abdellah commented 11 years ago

Trying to cover the issue https://github.com/Eyescale/Equalizer/issues/167, I was trying to find the cpuset that corresponds to the allocated core. I have used the following allocation command salloc -N1 -n1 --gres=gpu:3 -p interactive and I got the following topology map

Machine (24GB) cpuset=0x00000001
  NUMANode L#0 (P#0 12GB) cpuset=0x00000001
    Socket L#0 cpuset=0x00000001
      L3 L#0 (12MB) cpuset=0x00000001
        L2 L#0 (256KB) cpuset=0x00000001
          L1d L#0 (32KB) cpuset=0x00000001
            L1i L#0 (32KB) cpuset=0x00000001
              Core L#0 cpuset=0x00000001
                PU L#0 (P#0) cpuset=0x00000001
  NUMANode L#1 (P#1 12GB) cpuset=0x0
  HostBridge L#0
    PCIBridge
      PCI 10de:1080
    PCIBridge
      PCI 10de:1080
    PCIBridge
      PCI 8086:10d3
        Net L#0 "eth0"
    PCIBridge
      PCI 8086:10d3
        Net L#1 "eth1"
    PCIBridge
      PCI 102b:0532
    PCI 8086:3a22
      Block L#2 "sda"
  HostBridge L#6
    PCIBridge
      PCI 8086:10fb
        Net L#3 "eth2"
      PCI 8086:10fb
        Net L#4 "eth3"
    PCIBridge
      PCI 1077:7322
        Net L#5 "ib0"
        OpenFabrics L#6 "qib0"
    PCIBridge
      PCI 10de:1080

If I try to get the cpuset of the node connected to the host bridges of the allocated GPUs, I get a null cpuset for the third GPU, but I get for the second CPU the wrong one 0x0 which should be according to the previous figure 0x00000001(). The way I am getting the cpu sets is by

  1. Checking the corresponding GPU to the attached screen
if (HWLOC_OBJ_OSDEV_GPU == osdev->attr->osdev.type
                    && osdev->name
                    && sscanf(osdev->name, ":%u.%u", &x, &y) == 2
                    && port == x && device == y)

This gives me the osdev.

  1. I get the parent of the osdev which should be the PCI device.
hwloc_obj_t parent = osdev->parent;
  1. Getting the host bridge by the PCI bus
hwloc_obj_t host_bridge;
host_bridge = hwloc_get_hostbridge_by_pcibus (topology,
parent->attr->pcidev.domain,
parent->attr->pcidev.bus);
  1. Getting the cpuset of the previous sibling of the host bridge, which is the NUMANode which should have the same cpuset of the undelying socket.
hwloc_cpuset_t cpuset = hwloc_bitmap_dup(host_bridge->prev_sibling->cpuset);
marwan-abdellah commented 11 years ago

@eile Can you please assign this issue to me?