Open antikorol opened 2 weeks ago
It appears that the significant concurrency issue in the
Kube
class mentioned in the #2110 (comment) has been resolved, yet a genuine fixing solution for theRoundRobin
class is still required.Please note that our development process necessitates both unit and acceptance testing. It is advisable to replicate the issue with an acceptance test before implementing the bug-fixing solution. We need to replicate the bug using an acceptance test. I can draft a preliminary version of the test, and you can then refine it to create the final version.
@raman-m I'm not quite sure which fix you are referring to. In this case, there will no longer be a List of the form [service, null] because the issue with modifying the List of services from different threads has been fixed. Or do you mean adding a check that if the service or the HostAndPort property is null, then log the error and return a 404 status code for that request?
Or do you mean adding a check that if the service or the HostAndPort property is null, then log the error and return a 404 status code for that request?
Yes, I do. Could you include this check within the RoundRobin
class plz?
Thank you for adding the unit test. I'm assessing the efficiency of writing an acceptance test, which is expected to be quite complicated. Have you tested the bug fix in your environment?
Or do you mean adding a check that if the service or the HostAndPort property is null, then log the error and return a 404 status code for that request?
Yes, I do. Could you include this check within the
RoundRobin
class plz? Thank you for adding the unit test. I'm assessing the efficiency of writing an acceptance test, which is expected to be quite complicated. Have you tested the bug fix in your environment?
Added new error to handle this case when service or HostAndPort is null. The fix has been tested in dev/preprod environments, and the fix was checked with load testing at around 30 rps.
Thank you for conducting the tests! Is the official hotfix release package, version 23.3.4, needed to update your all environments and conduct the final bugfix test? Can you update environments with manually built DLLs?
Additionally, I'm going to review the code once more... Having unit/integration tests is likely sufficient to deem the code acceptable.
Thank you for conducting the tests! Is the official hotfix release package, version 23.3.4, needed to update your all environments and conduct the final bugfix test? Can you update environments with manually built DLLs?
At your discretion, I can release it with my fix for now and remove it in the future.
Great, Roman! I'd like to merge it into the develop branch, after which you can create a temporary build from your forked repository and deploy it to the Prod environment, okay?
Could you provide more details about your Production environment? Please include information on the hosting server (Docker, IIS, self-hosted), the number of Ocelot instances, CPU & memory resources per Ocelot instance, requests per second, and the technical stack (specific Ocelot features utilized), etc.
Is the Ocelot gateway deployed on a public or private network? Also, could you specify the industry that the Prod product is part of?
Is the Ocelot gateway deployed on a public or private network?
Unfortunately, the game is still in closed beta testing, and the loads on prod are much times lower than what we simulate during tests. Ocelot is deployed as a regular deployment in k8s. But If I find any issues, I will let you know :)
Roman, thank you for addressing the code review issues in the commit https://github.com/ThreeMammals/Ocelot/pull/2111/commits/bbed7d1aa195f425939cc5f3a69e245f6bdd565e! I will conduct the final code review shortly and may push some commits... Your patience is appreciated.
Fixes #2110
Proposed Changes
Move the creation of the services list from the class field to the method, to prevent list modification from different threads, which caused the partially modified list of services to be returned.