apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
1.83k stars 1.07k forks source link

agent: reconnect after waiting 5 seconds #9258

Closed weizhouapache closed 1 week ago

weizhouapache commented 2 weeks ago

Description

This PR fixes #8517

see the steps to reproduce the issue in the description of #8517

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

github-actions[bot] commented 2 weeks ago

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

weizhouapache commented 2 weeks ago

@blueorangutan package

blueorangutan commented 2 weeks ago

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 14.95%. Comparing base (083ac06) to head (0bdc722). Report is 20 commits behind head on 4.19.

Files Patch % Lines
agent/src/main/java/com/cloud/agent/Agent.java 0.00% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## 4.19 #9258 +/- ## ============================================ + Coverage 4.28% 14.95% +10.66% - Complexity 0 11015 +11015 ============================================ Files 363 5387 +5024 Lines 29393 470352 +440959 Branches 5139 60791 +55652 ============================================ + Hits 1260 70330 +69070 - Misses 27990 392228 +364238 - Partials 143 7794 +7651 ``` | [Flag](https://app.codecov.io/gh/apache/cloudstack/pull/9258/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [uitests](https://app.codecov.io/gh/apache/cloudstack/pull/9258/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `4.28% <ø> (-0.01%)` | :arrow_down: | | [unittests](https://app.codecov.io/gh/apache/cloudstack/pull/9258/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `15.66% <0.00%> (?)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

blueorangutan commented 2 weeks ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9956

weizhouapache commented 2 weeks ago

@blueorangutan test rocky8 kvm-rocky8

blueorangutan commented 2 weeks ago

@weizhouapache a [SL] Trillian-Jenkins test job (rocky8 mgmt + kvm-rocky8) has been kicked to run smoke tests

blueorangutan commented 2 weeks ago

[SF] Trillian test result (tid-10458) Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8 Total time taken: 46753 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9258-t10458-kvm-rocky8.zip Smoke tests completed. 129 look OK, 2 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_07_arping_in_vr Failure 5.24 test_diagnostics.py
test_02_trigger_shutdown Failure 341.46 test_safe_shutdown.py
rohityadavcloud commented 2 weeks ago

Could the mgmt server also sleep on such connections, on handshake exceptions? That is slow to respond such agent/clients?

weizhouapache commented 2 weeks ago

Could the mgmt server also sleep on such connections, on handshake exceptions? That is slow to respond such agent/clients?

thanks @rohityadavcloud for the advice I thought of it during my investigation. at the end I decided to change the agent as the connections are initialized by the agent. in my testing, the cpu load is lower than 10% I guess it is ok

rohityadavcloud commented 1 week ago

@weizhouapache is this ready for review now?

weizhouapache commented 1 week ago

@weizhouapache is this ready for review now?

yes @rohityadavcloud marked as ready for review

DaanHoogland commented 1 week ago

@blueorangutan LLtest alma9 kvm-alma9

blueorangutan commented 1 week ago

@DaanHoogland a [LL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests

rohityadavcloud commented 1 week ago

@blueorangutan package

blueorangutan commented 1 week ago

@rohityadavcloud a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan commented 1 week ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10086

sureshanaparti commented 1 week ago

@blueorangutan test

blueorangutan commented 1 week ago

@sureshanaparti a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan commented 1 week ago

[SF] Trillian test result (tid-10583) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 47920 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9258-t10583-kvm-centos7.zip Smoke tests completed. 130 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_01_secure_vm_migration Error 134.21 test_vm_life_cycle.py
test_01_secure_vm_migration Error 134.21 test_vm_life_cycle.py
weizhouapache commented 1 week ago

@blueorangutan test rocky8 kvm-rocky8

blueorangutan commented 1 week ago

@weizhouapache a [SL] Trillian-Jenkins test job (rocky8 mgmt + kvm-rocky8) has been kicked to run smoke tests

weizhouapache commented 1 week ago

@blueorangutan package

blueorangutan commented 1 week ago

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan commented 1 week ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10113

blueorangutan commented 1 week ago

[SF] Trillian test result (tid-10609) Environment: kvm-rocky8 (x2), Advanced Networking with Mgmt server r8 Total time taken: 48176 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9258-t10609-kvm-rocky8.zip Smoke tests completed. 130 look OK, 1 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_create_pvlan_network Error 0.09 test_pvlan.py