Tendrl / commons

Common code usable by all Tendrl components
http://www.tendrl.org
GNU Lesser General Public License v2.1
4 stars 23 forks source link

Import flow failure related log messages should be more specific about what went wrong #1081

Closed GowthamShanmugam closed 5 years ago

GowthamShanmugam commented 5 years ago

tendrl-bug-id: Tendrl/commons#1080 bugzilla: 1688630

Signed-off-by: GowthamShanmugasundaram gshanmug@redhat.com

GowthamShanmugam commented 5 years ago

https://github.com/Tendrl/monitoring-integration/pull/594/files

codecov[bot] commented 5 years ago

Codecov Report

Merging #1081 into master will decrease coverage by 0.82%. The diff coverage is 41.77%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1081      +/-   ##
==========================================
- Coverage   81.75%   80.93%   -0.83%     
==========================================
  Files          90       91       +1     
  Lines        3744     3808      +64     
  Branches      484      497      +13     
==========================================
+ Hits         3061     3082      +21     
- Misses        585      627      +42     
- Partials       98       99       +1
Impacted Files Coverage Δ
...ects/cluster/atoms/setup_cluster_alias/__init__.py 71.42% <0%> (-7.52%) :arrow_down:
...s/objects/cluster/atoms/import_cluster/__init__.py 10.93% <0%> (-0.36%) :arrow_down:
tendrl/commons/utils/ansible_module_runner.py 100% <100%> (ø) :arrow_up:
...cts/service/atoms/check_service_status/__init__.py 100% <100%> (ø) :arrow_up:
tendrl/commons/objects/service/__init__.py 100% <100%> (ø) :arrow_up:
tendrl/commons/utils/service_status.py 100% <100%> (ø) :arrow_up:
.../atoms/check_required_services_running/__init__.py 22.72% <22.72%> (ø)
...mmons/objects/node/flows/stop_services/__init__.py 75.86% <66.66%> (-2.71%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e30ac2c...fbe5192. Read the comment docs.

GowthamShanmugam commented 5 years ago

@shtripat @nthomas-redhat @mbukatov @dahorak

Changes which I made is:

  1. job thread should timeout only parent job when it stuck in "new" state, all child jobs will be timed out by its parent job with the proper error log message.
  2. job.load() is always won't update the object with new values, so the same job is picked by multiple nodes. Modified logic to load from an empty object. (https://bugzilla.redhat.com/show_bug.cgi?id=1602858 Root cause of this problem)
  3. Added pre-check in import and un-manage flow to check all necessary service are working fine. I am using ansible to check the service status if any problem in ansible then it will be displayed as job notification.
  4. Un-manage job is picked by the server node it have "tendrl/monitor" or "tendrl/integration/monitor" tag.
  5. Populate "tendrl/integration/monitor" tag from monitoring-integration sync: https://github.com/Tendrl/monitoring-integration/pull/594, If any problem with ansible then un-manage will be picked because it has "tendrl/integration/monitor" tag. and flow will fail with a proper error message.
GowthamShanmugam commented 5 years ago

@shtripat @nthomas-redhat please review

GowthamShanmugam commented 5 years ago

@shtripat please review