SUSE / DeepSea

A collection of Salt files for deploying, managing and automating Ceph.
GNU General Public License v3.0
161 stars 75 forks source link

[next] refactor bootstrap, mon and mgr code #1735

Closed jschmid1 closed 5 years ago

jschmid1 commented 5 years ago

CAVEAT:

The issue with podman not behaving properly when run on a minion not in debug mode persists.

NOTE:

It seems to be fine to run the minions in daemon mode

salt-minion -d

to resolve the issue mentioned in the previous PR

Signed-off-by: Joshua Schmid jschmid@suse.de


Checklist:

jschmid1 commented 5 years ago

fixed issues with make rpm

kshtsk commented 5 years ago

retest this please

jschmid1 commented 5 years ago

expecting lint/test to fail currently

jschmid1 commented 5 years ago

Lots of changes, but for the most part looks good.

Are we documenting the various salt-run ... cmds that can be invoked by the user ?

Not yet. I wanted to publish this information using a slightly different approach than last time. I think we discussed this in one of our meetings.

So essentially I want to use .rst or .md documents that sit alongside the runners. Those document should follow a pre-defined template and will be rendered to .man .pdf and .docbook to align our documentation effort accross the board.

What I realized is an improvement in salt though.

admin:/srv/pillar/ceph # salt-run config.deploy
config.deploy_ceph_conf:
     TODO: Technically this is the wrong word.. Just keeping it consistent 
config.deploy_global:
     Creates initial global configuration 
config.deploy_roles:
     Creates role assignments 
config.deploy_salt_conf:
     Creates Salt configuration 
admin:/srv/pillar/ceph # 

Seems like salt got fuzzy matching and prints the docstring as help/suggestion messages. neat :>

jschmid1 commented 5 years ago

Thanks for the review @mgfritch ! I pushed fixes to your suggestions in https://github.com/SUSE/DeepSea/pull/1735/commits/ecfb63e99b2cd06b955d2ca3d60a045b477fe6de

mgfritch commented 5 years ago

expecting lint/test to fail currently

Also, when do we expect to fix the lint/test that are failing?

mgfritch commented 5 years ago

Lots of changes, but for the most part looks good. Are we documenting the various salt-run ... cmds that can be invoked by the user ?

Not yet. I wanted to publish this information using a slightly different approach than last time. I think we discussed this in one of our meetings.

So essentially I want to use .rst or .md documents that sit alongside the runners. Those document should follow a pre-defined template and will be rendered to .man .pdf and .docbook to align our documentation effort accross the board.

hmm, yeah that's a tough one. Markdown is certainly much simpler, but I do see the value in reStructuredText ...

Maybe in the interim we just add few simple pre-reqs (e.g. podman, etc) with a brief quickstart guide on the wiki?

What I realized is an improvement in salt though.

admin:/srv/pillar/ceph # salt-run config.deploy
config.deploy_ceph_conf:
     TODO: Technically this is the wrong word.. Just keeping it consistent 
config.deploy_global:
     Creates initial global configuration 
config.deploy_roles:
     Creates role assignments 
config.deploy_salt_conf:
     Creates Salt configuration 
admin:/srv/pillar/ceph # 

Seems like salt got fuzzy matching and prints the docstring as help/suggestion messages. neat :>

Very cool, guess we need to make our method names more meaningful w/ better docstrings :smiley:

jschmid1 commented 5 years ago

const in the module would work, but I was more concerned for items like '/var/lib/ceph' instead of the entire volume_mounts dict.

Seems like a lot of these path names should be very common between modules ..

We can add those in the internal.yml?

jschmid1 commented 5 years ago

Based on the most recent changes in this PR, I still don't have a properly bootstrapped cluster .. but maybe I missed a step to correctly configure the admin keyring...

Distributing the bootstrap admin keyring to mon3.ceph
Module call failed on mon3.ceph with n/a and context: cp.get_file keyring

You may want to try this manually.

salt \mon3* cp.get_file salt://ceph/bootstrap/keyring /var/lib/ceph/tmp

This will probably tell you what's going wrong

Overall this PR works much better than what's provided in the current next branch and it should likely merge soon

great :+1:

jschmid1 commented 5 years ago

expecting lint/test to fail currently

Also, when do we expect to fix the lint/test that are failing?

My initial plan was to just push enough bootstrap/mon/mgr code to the repository to unblock everyone else before writing tests and cleaning up the code. I think we're about to reach this point, so I'll start with tests/lint issues soon.

jschmid1 commented 5 years ago

Pushed the requested change to b729694. (will fixup after the final OK)

swiftgist commented 5 years ago

With regards to the discussion about naming functions, I really want to break the link between "an appropritate python function" and "a suitable subcommand".

Salt has many missteps in this area already. We should not redouble our effort in following that. The most obvious example from Salt itself is the following

# salt '*' saltutil.pillar_refresh
# salt '*' saltutil.refresh_pillar

Both commands do the same thing and exist in the framework. The end result is that Saltstack likely did both because some significant part of their user base could not remember the order. When you add entire sentences with underscores as a user interface command, it's not helping anyone.

Ideally, if config.deploy cannot do both Salt and Ceph without causing problems in the workflow, then the config is likely the problem.

mgfritch commented 5 years ago

const in the module would work, but I was more concerned for items like '/var/lib/ceph' instead of the entire volume_mounts dict. Seems like a lot of these path names should be very common between modules ..

We can add those in the internal.yml?

:+1: internal.yml seems like a good location

mgfritch commented 5 years ago

Pushed the requested change to b729694. (will fixup after the final OK)

lgtm :+1:

mgfritch commented 5 years ago

Based on the most recent changes in this PR, I still don't have a properly bootstrapped cluster .. but maybe I missed a step to correctly configure the admin keyring...

Distributing the bootstrap admin keyring to mon3.ceph
Module call failed on mon3.ceph with n/a and context: cp.get_file keyring

You may want to try this manually.

salt \mon3* cp.get_file salt://ceph/bootstrap/keyring /var/lib/ceph/tmp

This will probably tell you what's going wrong

Unfortunately, it does not tell me much ...

admin:/ # salt \mon3* cp.get_file salt://ceph/bootstrap/keyring /var/lib/ceph/tmp
mon3.ceph:
admin:/ # echo $?
0

This seems to give somewhat more ...

# salt-run mon.deploy
Preparing deployment for mon3.ceph
Creating mon keyring for mon3.ceph
Distributing file: The minion function caused an exception: Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1664, in _thread_return
    return_data = minion_instance.executors[fname](opts, data, func, args, kwargs)
  File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 12, in execute
    return func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/podman.py", line 524, in create_mon_keyring
    return Deploy()._create_mon_keyring(name)
  File "/var/cache/salt/minion/extmods/modules/podman.py", line 427, in _create_mon_keyring
    self.ceph_bootstrap_master_dir: self.ceph_bootstrap_master_dir
  File "/var/cache/salt/minion/extmods/modules/podman.py", line 64, in run
    ret = check_output(self.run_cmd)
  File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=registry.suse.de/devel/storage/6.0/images/ses/6/ceph/ceph', '-e', 'NODE_NAME=admin', '-v', '/etc/ceph/:/etc/ceph', '-v', '/srv/salt/ceph/bootstrap:/srv/salt/ceph/bootstrap', '--entrypoint', '/usr/bin/ceph', 'registry.suse.de/devel/storage/6.0/images/ses/6/ceph/ceph', 'auth', 'get', 'mon.', '-o', '/srv/salt/ceph/bootstrap/mon_keyring.mon3.ceph']' returned non-zero exit status 125.
 to mon3.ceph
Module call failed on mon3.ceph with n/a and context: cp.get_file The minion function caused an exception: Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1664, in _thread_return
    return_data = minion_instance.executors[fname](opts, data, func, args, kwargs)
  File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 12, in execute
    return func(*args, **kwargs)
  File "/var/cache/salt/minion/extmods/modules/podman.py", line 524, in create_mon_keyring
    return Deploy()._create_mon_keyring(name)
  File "/var/cache/salt/minion/extmods/modules/podman.py", line 427, in _create_mon_keyring
    self.ceph_bootstrap_master_dir: self.ceph_bootstrap_master_dir
  File "/var/cache/salt/minion/extmods/modules/podman.py", line 64, in run
    ret = check_output(self.run_cmd)
  File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['/usr/bin/podman', 'run', '--rm', '--net=host', '-e', 'CONTAINER_IMAGE=registry.suse.de/devel/storage/6.0/images/ses/6/ceph/ceph', '-e', 'NODE_NAME=admin', '-v', '/etc/ceph/:/etc/ceph', '-v', '/srv/salt/ceph/bootstrap:/srv/salt/ceph/bootstrap', '--entrypoint', '/usr/bin/ceph', 'registry.suse.de/devel/storage/6.0/images/ses/6/ceph/ceph', 'auth', 'get', 'mon.', '-o', '/srv/salt/ceph/bootstrap/mon_keyring.mon3.ceph']' returned non-zero exit status 125.
 to /var/lib/ceph/tmp

But, it's still not enough, because the output from stderr of the subprocess command is missing ... let's discuss more in tomorrow's mtg...

jschmid1 commented 5 years ago

shall we get this merged so we have a common ground that we can base our future work on?