apache / cloudstack

Apache CloudStack is an opensource Infrastructure as a Service (IaaS) cloud computing platform
https://cloudstack.apache.org/
Apache License 2.0
2k stars 1.09k forks source link

Handle public IP race conditions #9234

Closed hsato03 closed 3 months ago

hsato03 commented 3 months ago

Description

When executing many operations simultaneously with a public IP, some inconsistencies are being implemented in the environment, such as the duplicate public IP (Fixes: #8967).

For this reason, this PR adds a lock to most APIs that manipulate a public IP, preventing inconsistencies that may occur due to race conditions.

Types of changes

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

Bug Severity

Screenshots (if appropriate):

How Has This Been Tested?

All tests were done following the same steps but using different APIs that create rules for the IP (createLoadBalancerRule, createFirewallRule, createPortForwardingRule, createIpForwardingRule):

  1. Associate a public IP with a network;
  2. Create a rule with one of the APIs;
  3. Disassociate the IP (before) and create a rule again (after) at the same time.

To test the createRemoteAccessVpn API, I created a firewall rule in step 2 and called the createRemoteAccessVpn API in step 3.

Before: The disassociateIpAddress API threw an exception and still disassociated the IP from the VR. The APIs that kept the IP in the VR even after disassociation were createLoadBalancerRule and createRemoteAccessVpn.

Now: All APIs that create rules for the IP wait for the disassociateIpAddress API to finish executing and only then begin execution, meaning that the exception is now thrown when creating the rule and no longer when disassociating the IP. No API held the IP in the VR.

The tests were made in high and low performance environments. The low performance environments were more susceptible to race conditions due to the time it takes to process a request.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 0% with 232 lines in your changes missing coverage. Please review.

Project coverage is 15.01%. Comparing base (edf7394) to head (30b32e9). Report is 106 commits behind head on 4.19.

Files Patch % Lines
...java/com/cloud/network/rules/RulesManagerImpl.java 0.00% 93 Missing :warning:
.../cloud/network/vpn/RemoteAccessVpnManagerImpl.java 0.00% 47 Missing :warning:
...n/java/com/cloud/network/IpAddressManagerImpl.java 0.00% 33 Missing :warning:
...om/cloud/network/firewall/FirewallManagerImpl.java 0.00% 32 Missing :warning:
...loud/network/lb/LoadBalancingRulesManagerImpl.java 0.00% 25 Missing :warning:
.../main/java/com/cloud/network/NetworkModelImpl.java 0.00% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## 4.19 #9234 +/- ## ============================================ + Coverage 14.96% 15.01% +0.05% - Complexity 11002 11090 +88 ============================================ Files 5373 5393 +20 Lines 469311 471535 +2224 Branches 57432 61434 +4002 ============================================ + Hits 70230 70805 +575 - Misses 391307 392839 +1532 - Partials 7774 7891 +117 ``` | [Flag](https://app.codecov.io/gh/apache/cloudstack/pull/9234/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [uitests](https://app.codecov.io/gh/apache/cloudstack/pull/9234/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `4.27% <ø> (-0.03%)` | :arrow_down: | | [unittests](https://app.codecov.io/gh/apache/cloudstack/pull/9234/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `15.73% <0.00%> (+0.05%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

GutoVeronezi commented 3 months ago

@weizhouapache, could you take a look at this one?

weizhouapache commented 3 months ago

@weizhouapache, could you take a look at this one?

Will do

@blueorangutan package

blueorangutan commented 3 months ago

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan commented 3 months ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9911

DaanHoogland commented 3 months ago

@blueorangutan test

blueorangutan commented 3 months ago

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan commented 3 months ago

[SF] Trillian test result (tid-10487) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 46193 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9234-t10487-kvm-centos7.zip Smoke tests completed. 131 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
weizhouapache commented 3 months ago

codewise looks good. need some manual testing (for instance, run the scripts for 12 hours).

sureshanaparti commented 3 months ago

@blueorangutan package

blueorangutan commented 3 months ago

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan commented 3 months ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10092

rohityadavcloud commented 3 months ago

@blueorangutan test ol8 kvm-ol8

blueorangutan commented 3 months ago

@rohityadavcloud [SL] unsupported parameters provided. Supported mgmt server os are: centos7, centos6, alma9, suse15, alma8, ubuntu18, ubuntu22, ubuntu20, rocky8, ubuntu24. Supported hypervisors are: kvm-centos6, kvm-centos7, kvm-rocky8, kvm-alma8, kvm-alma9, kvm-ubuntu18, kvm-ubuntu20, kvm-ubuntu22, kvm-ubuntu24, kvm-suse15, vmware-55u3, vmware-60u2, vmware-65u2, vmware-67u3, vmware-70u1, vmware-70u2, vmware-70u3, vmware-80, vmware-80u1, xenserver-65sp1, xenserver-71, xenserver-74, xcpng74, xcpng76, xcpng80, xcpng81, xcpng82

sureshanaparti commented 3 months ago

@blueorangutan test alma9 kvm-alma8

blueorangutan commented 3 months ago

@sureshanaparti a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma8) has been kicked to run smoke tests

blueorangutan commented 3 months ago

[SF] Trillian test result (tid-10598) Environment: kvm-alma8 (x2), Advanced Networking with Mgmt server a9 Total time taken: 45243 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr9234-t10598-kvm-alma8.zip Smoke tests completed. 131 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File
sureshanaparti commented 3 months ago

Hi @hsato03 Can you check & fix the NPEs mentioned here: https://github.com/apache/cloudstack/issues/8967#issuecomment-2195025830

hsato03 commented 3 months ago

@blueorangutan package

blueorangutan commented 3 months ago

@hsato03 a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan commented 3 months ago

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 10201