IBM / ibm-spectrum-scale-install-infra

Spectrum Scale Installation and Configuration
Apache License 2.0
64 stars 67 forks source link

NSD creation being skipped when using host_vars... #81

Open whowutwut opened 4 years ago

whowutwut commented 4 years ago

Need some help... I'm backed to the current dev branch (also tried master)... . using host_vars, where I have disks on workers, seems to be skipped in creating the NSD.

Is there something wrong with my config? I do think this was working in the past... How to best debug this?

Here's my host_vars file:

# ls -ltr host_vars/
total 8
-rw-r--r-- 1 root root   0 Apr 16 20:07 autogen-hostvars-rhels77-x-master
-rw-r--r-- 1 root root 580 Apr 16 20:07 autogen-hostvars-rhels77-x-worker1
-rw-r--r-- 1 root root 580 Apr 16 20:07 autogen-hostvars-rhels77-x-worker2

I tried it both with the content of this file and removing those lines indicated, no changes, still skipped NSDs..

# cat host_vars/autogen-hostvars-rhels77-x-worker1
scale_storage:
  - filesystem: gpfs01
    blockSize: 4M
    numNodes: 16
    automaticMountOption: true
    defaultMountPoint: /mnt/gpfs01
    # force overwrite the NSDs since we probably did not clean up prior
    overwriteNSDs: true
    disks:
      - device: /dev/vdb
-        failureGroup: "2"
-        usage: "dataAndMetadata"
        servers: "autogen-hostvars-rhels77-x-worker1"
      - device: /dev/vdc
        nsd: "autogen_hostvars_rhels77_x_worker1_nsd_vdc"
-        failureGroup: "2"
-        usage: "dataAndMetadata"
        servers: "autogen-hostvars-rhels77-x-worker1"

But then I see the tasks being skipped....

TASK [core/cluster : storage | Prepare StanzaFile(s) for NSD creation] ***************************************************
changed: [autogen-hostvars-rhels77-x-master] => (item=gpfs01)

TASK [core/cluster : storage | Accept server license for NSD servers] ****************************************************
skipping: [autogen-hostvars-rhels77-x-master]

TASK [core/cluster : storage | Create new NSDs] **************************************************************************
skipping: [autogen-hostvars-rhels77-x-master] => (item={u'changed': True, u'uid': 0, u'dest': u'/var/tmp/StanzaFile.new.gpfs01', u'owner': u'root', 'diff': [], u'size': 1, u'src': u'/root/.ansible/tmp/ansible-tmp-1587093686.15-28107-223240455255342/source', 'ansible_loop_var': u'item', u'group': u'root', 'item': u'gpfs01', u'checksum': u'adc83b19e793491b1c6ea0fd8b46cd9f32e592fc', u'md5sum': u'68b329da9893e34099c7d8ad5cb9c940', 'failed': False, u'state': u'file', u'gid': 0, u'mode': u'0644', u'invocation': {u'module_args': {u'directory_mode': None, u'force': True, u'remote_src': None, u'dest': u'/var/tmp/StanzaFile.new.gpfs01', u'selevel': None, u'_original_basename': u'StanzaFile.j2', u'delimiter': None, u'regexp': None, u'owner': None, u'follow': False, u'validate': None, u'local_follow': None, u'src': u'/root/.ansible/tmp/ansible-tmp-1587093686.15-28107-223240455255342/source', u'group': None, u'unsafe_writes': None, u'checksum': u'adc83b19e793491b1c6ea0fd8b46cd9f32e592fc', u'seuser': None, u'serole': None, u'content': None, u'setype': None, u'mode': None, u'attributes': None, u'backup': False}}})

TASK [core/cluster : storage | Prepare StanzaFile(s) for filesystem creation] ********************************************

TASK [core/cluster : storage | Consolidate defined filesystem parameters] ************************************************
ok: [autogen-hostvars-rhels77-x-master] => (item={u'numNodes': 16, u'overwriteNSDs': True, u'blockSize': u'4M', u'disks': [{u'device': u'/dev/vdb', u'usage': u'dataAndMetadata', u'failureGroup': u'2', u'servers': u'autogen-hostvars-rhels77-x-worker1'}, {u'device': u'/dev/vdc', u'usage': u'dataAndMetadata', u'failureGroup': u'2', u'nsd': u'autogen_hostvars_rhels77_x_worker1_nsd_vdc', u'servers': u'autogen-hostvars-rhels77-x-worker1'}], u'filesystem': u'gpfs01', u'defaultMountPoint': u'/mnt/gpfs01', u'automaticMountOption': True})
ok: [autogen-hostvars-rhels77-x-master] => (item={u'numNodes': 16, u'overwriteNSDs': True, u'blockSize': u'4M', u'disks': [{u'device': u'/dev/vdb', u'usage': u'dataAndMetadata', u'failureGroup': u'2', u'servers': u'autogen-hostvars-rhels77-x-worker2'}, {u'device': u'/dev/vdc', u'usage': u'dataAndMetadata', u'failureGroup': u'2', u'nsd': u'autogen_hostvars_rhels77_x_worker2_nsd_vdc', u'servers': u'autogen-hostvars-rhels77-x-worker2'}], u'filesystem': u'gpfs01', u'defaultMountPoint': u'/mnt/gpfs01', u'automaticMountOption': True})

TASK [core/cluster : storage | Prepare StanzaFile(s) for NSD creation] ***************************************************
changed: [autogen-hostvars-rhels77-x-master] => (item=gpfs01)

TASK [core/cluster : storage | Accept server license for NSD servers] ****************************************************
skipping: [autogen-hostvars-rhels77-x-master]

TASK [core/cluster : storage | Create new NSDs] **************************************************************************
skipping: [autogen-hostvars-rhels77-x-master] => (item={u'changed': True, u'uid': 0, u'dest': u'/var/tmp/StanzaFile.new.gpfs01', u'owner': u'root', 'diff': [], u'size': 1, u'src': u'/root/.ansible/tmp/ansible-tmp-1587093686.15-28107-223240455255342/source', 'ansible_loop_var': u'item', u'group': u'root', 'item': u'gpfs01', u'checksum': u'adc83b19e793491b1c6ea0fd8b46cd9f32e592fc', u'md5sum': u'68b329da9893e34099c7d8ad5cb9c940', 'failed': False, u'state': u'file', u'gid': 0, u'mode': u'0644', u'invocation': {u'module_args': {u'directory_mode': None, u'force': True, u'remote_src': None, u'dest': u'/var/tmp/StanzaFile.new.gpfs01', u'selevel': None, u'_original_basename': u'StanzaFile.j2', u'delimiter': None, u'regexp': None, u'owner': None, u'follow': False, u'validate': None, u'local_follow': None, u'src': u'/root/.ansible/tmp/ansible-tmp-1587093686.15-28107-223240455255342/source', u'group': None, u'unsafe_writes': None, u'checksum': u'adc83b19e793491b1c6ea0fd8b46cd9f32e592fc', u'seuser': None, u'serole': None, u'content': None, u'setype': None, u'mode': None, u'attributes': None, u'backup': False}}})

and then after the playbook is completed

 ls -ltr /var/tmp
total 12
drwx------ 3 root root  17 Apr 16 19:27 systemd-private-9beb40d1cb4f4c13823a7eef65a7ea1b-ntpd.service-GMMAbo
-rw-r--r-- 1 root root 161 Apr 16 20:18 ChangeFile
-rw-r--r-- 1 root root   1 Apr 16 20:21 StanzaFile.new.gpfs01
-rw-r--r-- 1 root root   1 Apr 16 20:21 StanzaFile.gpfs01
# cat /var/tmp/StanzaFile.gpfs01

# cat /var/tmp/StanzaFile.new.gpfs01

# mmlsnsd
mmlsnsd: [I] No disks were found.

On another cluser, using group_vars with this config works, no problem

# cat group_vars/all
scale_storage:
  - filesystem: gpfs01
    overwriteNSDs: true
    disks:
      - device: /dev/vdb
        servers: autogen-groupvars-rhels77-x-worker1
      - device: /dev/vdc
        servers: autogen-groupvars-rhels77-x-worker1
      - device: /dev/vdb
        servers: autogen-groupvars-rhels77-x-worker2
      - device: /dev/vdc
        servers: autogen-groupvars-rhels77-x-worker2
whowutwut commented 4 years ago

It seems like there's no scale_storage when we get into https://github.com/IBM/ibm-spectrum-scale-install-infra/blob/dev/roles/core/node/templates/StanzaFile.j2#L1

I cleared out this file and just added...

debug: {{ current_fs }}
debug: {{ scale_storage }}

And the following is printed:

debug: gpfs01
debug - scale_storage: []

Just for my own sanity, I checked out 6bbb0be5e0ef162711321566c9167c92b5aa0543 which is really early on in master... ran the same files through it... and NSDs get created..

[root@autogen-hostvars-rhels77-x-master ~]# cat /var/tmp/StanzaFile.new.gpfs01
%nsd:
  device=/dev/vdb
  nsd=nsd_autogen-hostvars-rhels77-x-worker1_vdb
  servers=autogen-hostvars-rhels77-x-worker1
  usage=dataAndMetadata
  failureGroup=2
  pool=system

%nsd:
  device=/dev/vdc
  nsd=autogen_hostvars_rhels77_x_worker1_nsd_vdc
  servers=autogen-hostvars-rhels77-x-worker1
  usage=dataAndMetadata
  failureGroup=2
  pool=system

%nsd:
  device=/dev/vdb
  nsd=nsd_autogen-hostvars-rhels77-x-worker2_vdb
  servers=autogen-hostvars-rhels77-x-worker2
  usage=dataAndMetadata
  failureGroup=2
  pool=system

%nsd:
  device=/dev/vdc
  nsd=autogen_hostvars_rhels77_x_worker2_nsd_vdc
  servers=autogen-hostvars-rhels77-x-worker2
  usage=dataAndMetadata
  failureGroup=2
  pool=system

Edit: I have to find a better commit to check out, this one I picked creates the NSDs in the Stanza, (which at least proves that my host_vars files does work)... but I have issues with passing in - to the nsd names, which was fixed later on in the code.

rajan-mis commented 4 years ago

@whowutwut i think it might be servers and nsd name in the group_vars/all that you have added in double quotes creating issue, remove double quotes then it should work. Yours definition ,

disks:
      - device: /dev/vdb
        servers: "autogen-hostvars-rhels77-x-worker1"
      - device: /dev/vdc
        nsd: "autogen_hostvars_rhels77_x_worker1_nsd_vdc"
        servers: "autogen-hostvars-rhels77-x-worker1
rajan-mis commented 4 years ago

@whowutwut I realized one more thing, You have defined this parameter in the host_vars, now it will not work with host_vars , user always needs to defined in the group_vars.

README statement: Important: scale_storage must be define using group_vars inventory files. Do not define disk parameters using host_vars inventory files or inline

We already had a multiple discussion in the slack channel about group_vars and host_vars.

whowutwut commented 4 years ago

We already had a multiple discussion in the slack channel about group_vars and host_vars.

Sure, understand. But I hit this again, and did not realize what was going on. So other reasons for opening issues in this public tracker is that it helps the community. If anyone out there runs into similar things, these issues will appear in google searches once the crawlers do their thing.

Solution: If you are cloned at this point in time or later ... https://github.com/IBM/ibm-spectrum-scale-install-infra/pull/39 you MUST not use host_vars and switch to defining scale_storage under group_vars, which include the NSD definitions.