Take drivegroup targets into account for rebuild.node (bsc#1198929)

Just to demonstrate, let's say we have a four node cluster, each node having 1 SSD and 2 spinners, with two drivegroups defined:

# cat /srv/salt/ceph/configuration/files/drive_groups.yml
all_devs:
  target: 'node[12]*'
  data_devices:
    all: true

shared_db:
  target: 'node[34]*'
  data_devices:
    rotational: 1
  db_devices:
    rotational: 0

When deploying initially (stage 3), we see:

# salt-run --log-level=warning state.orch ceph.stage.3
[...]
Found DriveGroup <all_devs>
Calling dg.deploy on compound target node[12]*
Found DriveGroup <shared_db>
Calling dg.deploy on compound target node[34]*

This correctly deploys the drive groups on the nodes you'd expect (as always happened). In this example, three disks used as OSDs on each of the first two nodes, and two OSDs on the other two nodes (the third disk is the shared db, so isn't listed in ceph osd tree output):

# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       0.09354 root default                           
 -3       0.02339     host node1                         
  2   hdd 0.00780         osd.2      up  1.00000 1.00000 
  5   hdd 0.00780         osd.5      up  1.00000 1.00000 
  0   ssd 0.00780         osd.0      up  1.00000 1.00000 
 -5       0.02339     host node2                         
  3   hdd 0.00780         osd.3      up  1.00000 1.00000 
  4   hdd 0.00780         osd.4      up  1.00000 1.00000 
  1   ssd 0.00780         osd.1      up  1.00000 1.00000 
-10       0.02338     host node3                         
  7   hdd 0.01169         osd.7      up  1.00000 1.00000 
  9   hdd 0.01169         osd.9      up  1.00000 1.00000 
-13       0.02338     host node4                         
  6   hdd 0.01169         osd.6      up  1.00000 1.00000 
  8   hdd 0.01169         osd.8      up  1.00000 1.00000

Prior to this fix, if I were to rebuild node3 or node4, it would end up just deploying all drive groups in sequence on node3, which is wrong, and in this case means that node ends up with three standalone OSDs, not two with a shared db like we expected:

# salt-run --log-level=warning rebuild.node node3.ses6.test
[...]
Found DriveGroup <all_devs>
Calling dg.deploy on compound target node3.ses6.test
Found DriveGroup <shared_db>
Calling dg.deploy on compound target node3.ses6.test

# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       0.09355 root default                           
 -3       0.02339     host node1                         
  2   hdd 0.00780         osd.2      up  1.00000 1.00000 
  5   hdd 0.00780         osd.5      up  1.00000 1.00000 
  0   ssd 0.00780         osd.0      up  1.00000 1.00000 
 -5       0.02339     host node2                         
  3   hdd 0.00780         osd.3      up  1.00000 1.00000 
  4   hdd 0.00780         osd.4      up  1.00000 1.00000 
  1   ssd 0.00780         osd.1      up  1.00000 1.00000 
-10       0.02339     host node3                         
  9   hdd 0.00780         osd.9      up  1.00000 1.00000 
 10   hdd 0.00780         osd.10     up  1.00000 1.00000 
  7   ssd 0.00780         osd.7      up  1.00000 1.00000 
-13       0.02338     host node4                         
  6   hdd 0.01169         osd.6      up  1.00000 1.00000 
  8   hdd 0.01169         osd.8      up  1.00000 1.00000

Now, with this fix, when rebuilding node3 or node4 we see:

# salt-run --log-level=warning rebuild.node node3.ses6.test
[...]
Found DriveGroup <all_devs>
Calling dg.deploy on compound target ( node[12]* ) and ( node3.ses6.test )
No minions matched the target. No command was sent, no jid was assigned.
Found DriveGroup <shared_db>
Calling dg.deploy on compound target ( node[34]* ) and ( node3.ses6.test )

Note how the first drivegroup (compound target ( node[12]* ) and ( node3.ses6.test ) doesn't match any minions, so isn't applied, whereas the second drivegroup does match, and is applied to the specified node, and we're back to what we expected to see:

# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
 -1       0.09354 root default                           
 -3       0.02339     host node1                         
  2   hdd 0.00780         osd.2      up  1.00000 1.00000 
  5   hdd 0.00780         osd.5      up  1.00000 1.00000 
  0   ssd 0.00780         osd.0      up  1.00000 1.00000 
 -5       0.02339     host node2                         
  3   hdd 0.00780         osd.3      up  1.00000 1.00000 
  4   hdd 0.00780         osd.4      up  1.00000 1.00000 
  1   ssd 0.00780         osd.1      up  1.00000 1.00000 
-10       0.02338     host node3                         
  7   hdd 0.01169         osd.7      up  1.00000 1.00000 
  9   hdd 0.01169         osd.9      up  1.00000 1.00000 
-13       0.02338     host node4                         
  6   hdd 0.01169         osd.6      up  1.00000 1.00000 
  8   hdd 0.01169         osd.8      up  1.00000 1.00000

SUSE / DeepSea

Take drivegroup targets into account for rebuild.node (bsc#1198929) #1890