An investigation into backpressure issues in the router.
Most of the changes are in various plugins to implement backpressure. However, those fixes are not enough to provide useful functionality...
The current implementation of the router create a new pipeline for each connection. This has the unfortunate impact of discarding state which is required for various load impacting layers to work correctly.
This exploration modifies the router to hold a single master pipeline which is clone'd for each connection. This allows the various tower connection limiting layers to work correctly.
I've got a version which works with standard tower layers, commented out here, but I've also got a potentially more interesting version which uses a load shedded based on Little's Law, which is what is active in this code.
Notes:
Modifying the pipeline to be cloneable has generally worked fine, but it has caused issues for the Limit layer. This layer looks "generally problematic" since it appears to make a number of assumptions about what request rejection actually means. I've done some minimal modification to try and make it work win a cloned pipeline, but tests are still failing and I'm not sure it does what it should do.
I also noticed that when implementing backpressure, various mock tests needed to be modified since test rejection happened earlier in the pipeline and a map_result() somewhere isn't triggered. That needs some investigation, but I think it's a small problem to address.
I modified the bridge query planner pool to prevent excessive queueing in this layer. Since I now want to control this by load_shedding before this service is reached, I only want enough channels to support the number of planners.
Summary:
This PR provides a router which will operate with approximately the same performance of the base router, but which controls memory and rejects excess load to prevent "over-commit" by the router. This is a very desirable property.
More testing is required, but this is looking promising so far.
Description here
Fixes #issue_number
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
[ ] Changes are compatible[^1]
[ ] Documentation[^2] completed
[ ] Performance impact assessed and acceptable
Tests added and passing[^3]
[ ] Unit Tests
[ ] Integration Tests
[ ] Manual Tests
Exceptions
Note any exceptions here
Notes
[^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.
[^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples.
[^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.
An investigation into backpressure issues in the router.
Most of the changes are in various plugins to implement backpressure. However, those fixes are not enough to provide useful functionality...
The current implementation of the router create a new pipeline for each connection. This has the unfortunate impact of discarding state which is required for various load impacting layers to work correctly.
This exploration modifies the router to hold a single master pipeline which is clone'd for each connection. This allows the various tower connection limiting layers to work correctly.
I've got a version which works with standard tower layers, commented out here, but I've also got a potentially more interesting version which uses a load shedded based on Little's Law, which is what is active in this code.
Notes:
Modifying the pipeline to be cloneable has generally worked fine, but it has caused issues for the Limit layer. This layer looks "generally problematic" since it appears to make a number of assumptions about what request rejection actually means. I've done some minimal modification to try and make it work win a cloned pipeline, but tests are still failing and I'm not sure it does what it should do.
I also noticed that when implementing backpressure, various mock tests needed to be modified since test rejection happened earlier in the pipeline and a map_result() somewhere isn't triggered. That needs some investigation, but I think it's a small problem to address.
I modified the bridge query planner pool to prevent excessive queueing in this layer. Since I now want to control this by load_shedding before this service is reached, I only want enough channels to support the number of planners.
Summary:
This PR provides a router which will operate with approximately the same performance of the base router, but which controls memory and rejects excess load to prevent "over-commit" by the router. This is a very desirable property.
More testing is required, but this is looking promising so far.
Description here
Fixes #issue_number
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
[^1]: It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. [^2]: Configuration is an important part of many changes. Where applicable please try to document configuration examples. [^3]: Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.