SUSE / DeepSea

A collection of Salt files for deploying, managing and automating Ceph.
GNU General Public License v3.0
160 stars 75 forks source link

rebuild runner needs to read error messages from osd.py(runner) #1746

Open jschmid1 opened 4 years ago

jschmid1 commented 4 years ago

When running a salt-run rebuild.node operation which fails before unmounting the drive but after zapping it we have stale data from osd.list(minion module).

The osd.remove func will raise an Exception [OSDNotFound] that needs to be handled by this function.

Otherwise the rebuild runner just exits with:

[ERROR   ] Failed to remove OSD(s)... skipping data1
The following minions were skipped:
data1

Resolve any issues and run
 salt-run rebuild.nodes data1

I also noted that the rebuild runner in master is not the same as in SES6. That needs to be forwardported eventually.

swiftgist commented 4 years ago

Shouldn't the try...except get caught in osd.remove since exceptions do not propagate across runners? Also, the osd.remove module is user facing and should give a reasonable return error independent of rebuild.node.

The _check_return is a summary of all the operations for a minion. Errors for a specific osd should come from osd.remove.

jschmid1 commented 4 years ago

Shouldn't the try...except get caught in osd.remove since exceptions do not propagate across runners? Also, the osd.remove module is user facing and should give a reasonable return error independent of rebuild.node.

Right, and I'm inclined to rework that..

The _check_return is a summary of all the operations for a minion. Errors for a specific osd should come from osd.remove.

It still needs context, I think the discussion we'll have about module return types for deepsea-next will have some influence on that.