apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.83k stars 1.07k forks source link

Add, Delete Storage Pool commands should be able execute on a host in maintenance #9301

Closed abh1sar closed 3 days ago

abh1sar commented 6 days ago

Description

This PR...

Fixes #9295

When a host is in maintenance, CreateStoragePoolCommand and DeleteStoragePoolCommand are not allowed to execute by the AgentAttache. This will cause a new storage pool to not be present on the host even after it comes out of maintenance, as the cloudstack agent is not restarted when cancel maintain is called (#3239). This also causes a deleted storage pool to never be removed from a host in maintenance.

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

  1. Put a host in maintenance.
  2. Add a new storage pool and verify that it is mounted on the host
  3. Delete a storage pool and verify that it has been unmounted from the host
  4. Take the host out of maintenance and verify again.

How did you try to break this feature and the system with this change?

abh1sar commented 6 days ago

@blueorangutan package

blueorangutan commented 6 days ago

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

codecov[bot] commented 6 days ago

Codecov Report

Attention: Patch coverage is 27.27273% with 8 lines in your changes missing coverage. Please review.

Project coverage is 12.24%. Comparing base (351de5f) to head (50e09fe). Report is 1 commits behind head on 4.18.

Files Patch % Lines
...n/java/com/cloud/resource/ResourceManagerImpl.java 0.00% 7 Missing :warning:
...cycle/CloudStackPrimaryDataStoreLifeCycleImpl.java 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## 4.18 #9301 +/- ## ============================================ - Coverage 12.24% 12.24% -0.01% Complexity 9296 9296 ============================================ Files 4699 4699 Lines 414331 414347 +16 Branches 51999 51008 -991 ============================================ + Hits 50731 50732 +1 - Misses 357295 357310 +15 Partials 6305 6305 ``` | [Flag](https://app.codecov.io/gh/apache/cloudstack/pull/9301/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/apache/cloudstack/pull/9301/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `12.24% <27.27%> (-0.01%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

blueorangutan commented 6 days ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10111

abh1sar commented 6 days ago

@blueorangutan test

blueorangutan commented 6 days ago

@abh1sar a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

abh1sar commented 5 days ago

@blueorangutan package

blueorangutan commented 5 days ago

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan commented 5 days ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10124

blueorangutan commented 5 days ago

[SF] Trillian test result (tid-10612) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 42162 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9301-t10612-kvm-centos7.zip Smoke tests completed. 107 look OK, 3 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_migrate_vm Error 45.85 test_vm_life_cycle.py
test_01_cancel_host_maintenace_with_no_migration_jobs Error 114.46 test_host_maintenance.py
test_disable_oobm_ha_state_ineligible Error 1513.05 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Failure 310.26 test_hostha_kvm.py
abh1sar commented 5 days ago

@blueorangutan test

blueorangutan commented 5 days ago

@abh1sar a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan commented 5 days ago

[SF] Trillian test result (tid-10626) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 41860 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9301-t10626-kvm-centos7.zip Smoke tests completed. 107 look OK, 3 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_08_migrate_vm Error 45.92 test_vm_life_cycle.py
test_01_vpc_site2site_vpn Failure 279.86 test_vpc_vpn.py
test_hostha_enable_ha_when_host_disabled Error 3.68 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Error 302.87 test_hostha_kvm.py
test_hostha_kvm_host_recovering Error 7.13 test_hostha_kvm.py
abh1sar commented 4 days ago

@blueorangutan test keepEnv

blueorangutan commented 4 days ago

@abh1sar a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

abh1sar commented 4 days ago

Working on resolving smoke test failures

weizhouapache commented 4 days ago

codewise lgtm

@abh1sar what about other resource states, like ErrorInMaintenance or ErrorInPrepareForMaintenance, PrepareForMaintenance ?

blueorangutan commented 3 days ago

[SF] Trillian test result (tid-10648) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 45084 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9301-t10648-kvm-centos7.zip Smoke tests completed. 109 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_hostha_enable_ha_when_host_disabled Error 4.74 test_hostha_kvm.py
test_hostha_enable_ha_when_host_in_maintenance Error 302.76 test_hostha_kvm.py
abh1sar commented 3 days ago

The approach of ssh-ing into the in-maintenance-host to restart the agent even if the agent was already connected is not right, as it breaks the change done in https://github.com/apache/cloudstack/pull/3239

I think better solution would be to allow createStoragePoolCommand to run on the host in maintenance mode (like ModifyStoragePoolCommand)

@sureshanaparti @kiranchavala

sureshanaparti commented 3 days ago

The approach of ssh-ing into the in-maintenance-host to restart the agent even if the agent was already connected is not right, as it breaks the change done in #3239

I think better solution would be to allow createStoragePoolCommand to run on the host in maintenance mode (like ModifyStoragePoolCommand)

@sureshanaparti @kiranchavala

Yes @abh1sar, If the agent is already connected, better check & update the storage pools with agent command.

abh1sar commented 3 days ago

@blueorangutan package

blueorangutan commented 3 days ago

@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

abh1sar commented 3 days ago

codewise lgtm

@abh1sar what about other resource states, like ErrorInMaintenance or ErrorInPrepareForMaintenance, PrepareForMaintenance ?

Have changed the approach, so this code is now obsolete. Please check.

blueorangutan commented 3 days ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10184

abh1sar commented 3 days ago

@blueorangutan test

blueorangutan commented 3 days ago

@abh1sar a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan commented 2 days ago

[SF] Trillian test result (tid-10670) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 39649 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9301-t10670-kvm-centos7.zip Smoke tests completed. 109 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_hostha_kvm_host_fencing Error 106.67 test_hostha_kvm.py