Open htuch opened 5 years ago
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.
I've been talking with @htuch about implementing ORCA's in-band reporting, and adding it's details to a particular stream.)
However, when bringing it up @htuch mentioned that the RFC had some debate around how inband reporting should be implemented. Currently the RFC Calls for stuffing JSON in x-endpoint-load-metrics
. However after talking with @PiotrSikora I think this is the incorrect choice, and would like to spur extra conversation here. Since I shouldn't be the sole person to make this decision :stuck_out_tongue_winking_eye:
While the JSON Reporter is using a standard encoding (JSON), not all proxies currently support JSON out of the box, nor should they. Most of the time the bytes are just moving through them, and they only need to know how to parse HTTP. (Some can be extended: see NGINX, but those are always custom extensions/code written to do so).
If we want ORCA to become a true standard, by lowering the barrier to entry for them by not forcing them to add in JSON parsing if they don't need to, adoption would increase. Which would help the total number of users, and it's usefulness.
Instead I recommend we implement both parsing of: x-endpoint-load-metrics-bin
(binary protobuf format), and: x-endpoint-load-metrics
. The real one we should focus on is: x-endpoint-load-metrics
(which maybe should even be called endpoint-load-metrics
since the IETF recommends not using x-). The reason for this is two fold:
Cache-Control: max-age=seconds
for example. This is already the basis for parameter list.x-endpoint-load-metrics-bin
I think should be supported because it can be a nice optimization for those already integrating with ORCA protobuf, and wanting to not have to juggle two seperate encodings of what to send on.I don't imagine it being a huge ask to do so, making it not worth the implementation effort.
Happy to hear thoughts on this, and coming up with something official that aren't just my bemused thoughts :smile:
@SecurityInsanity yes, https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-11 is fine. Let's just make sure that this is a direct translation of the data model that exists in the proto, i.e. it should be equivalently expressive.
For sure 👍🏻. I think keeping the model is needed.
I’ll start working on a PR tomorrow. Can you update the doc? (I don’t think I have write access).
@SecurityInsanity sure. I think we should hash out the new representation here first. Looking at https://github.com/envoyproxy/envoy/blob/master/api/udpa/data/orca/v1/orca_load_report.proto, I think we may need multiple headers, e.g. x-endpoints-load-metrics
, x-endpoints-load-metrics-cost
, x-endpoint-load-metrics-utilization
to correctly distinguish the core fields and the distinct maps that exist in the data model. @PiotrSikora do you think this is correct?
FWIW, coincidentally I'm working on migrating the API tree in https://github.com/envoyproxy/envoy/tree/master/api/udpa (which includes the ORCA protos) to live in https://github.com/cncf/udpa-wg today. This shouldn't have a major impact on your work, but there might need to be some slight path or Bazel fixups needed once this lands.
@htuch That's good to know about the proto's moving, thanks. Based on my understanding of the priority yes we'd need 3 headers. (one for the core type, and two for the two maps).
After talking over this a bit the current plan for implementing ORCA is:
map<custom_metric_key, pair<total_count, avg>>
There are a couple notes here:
I'm posting here incase anyone has any commments/questions/concerns.
@SecurityInsanity sounds like a plan from my side. Looking forwarding to seeing ORCA support landing :)
@SecurityInsanity assigning issue to you for the ORCA implementation work planned. Feel free to assign back if there is any remaining future work once that lands.
Hello everyone, I have a question, is there any difference between orca_load_report and the original LRS? My understanding is that orca_load_report is the backend server passing load information to envoy, and LRS is passing information between envoy and management server?
Hey @CodingSinger,
ORCA for now is actually going to be integrated into the LRS when it is implemented. It will provide a richer set of information.
Right now the LRS only provides load info about the number of requests, who it’s routing to and when. ORCA compliments that info by allowing services to report how much a request cost. For example a service can say “processing this request I took up 20% cpu”.
There are two ways a service can report this back to envoy:
We’re targeting reading response headers first. Admittedly I’ve had a lot going on so this has slumped, however I hope to have something up in the coming weeks.
@SecurityInsanity
Thanks for your reply. But I think now it seems to be divided into LoadReportingService
andOrcaLoadReport
. According to your reply, are both ORCA and LRS acting between the backend server and envoy?
But I found in the comment in LoadReportingService
that
// Independently, Envoy will initiate a StreamLoadStats bidi stream with a
// management serve
@CodingSinger ,
ORCA metrics will be added to the LoadReportingService (not replacing) stats since we believe it is useful there as well, but the actual ORCA stats are being sent between the thing envoy is sending requests to and envoy.
@SecurityInsanity Thanks. I have got it.
Does anybody have any pointers to blogs / papers about about considerations for multi region or global load balancing algorithms? (Useful for input into a designing a system which would leverage functionality like ORCA)
That's a great question @erikbos. @alexburnos @antoniovicente are you folks aware of any public material that talks to how backend named costs would integrate with global LB?
Do know anything public that would be specifically focused on LB algorithms, but maybe chapter on managing load in the SRE book could give some high level ideas.
Thanks for the reference to the SRE book, it's always a good read but I was looking for the next level of depth.. On Slack @snowp mentioned https://netflixtechblog.com/netflix-edge-load-balancing-695308b5548c which contains some of that 👍
It is an amazing feature.Is there any news? :call_me_hand:
gRPC has adopted ORCA (and its xDS definitions) as the basis of load reporting for gRPC-LB v2 CC @markdroth. We still do not have any Envoy implementation though, very much open to any contribution PRs here.
Just for reference for anyone working on this, the ORCA support in gRPC is documented in gRFC A51: Custom Backend Metrics Support and gRFC A64: xDS LRS Custom Metrics Support.
See also gRFC A58: weighted_round_robin LB policy for how ORCA is used in load balancing.
@Mythra are you still working on this? If not, I'm a little interesting in this issue.
@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez
@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez
Any detail info about this? Looking forward to this feature. Any we can help this?
Here is a draft outline of what we are trying to do:
More details are here.
Basically, I think there are two different part works:
I personally think we should only provide simplest support to the common metrics: cpu, mem, application_utilization, etc. first. These attributes could cover most cases.
The named_metrics
, utilization
, and request_cost
may has more complex semantics and will bring more heavy overhead. So, I will prefer to only provide simplest implementation first until our users ask more.
I agree with a two parts distinction.
We will start with using orca report for LRS. I agree that common orca metrics cover many cases, however we want to provide our users with flexibility of using named metrics if necessary. The additional complexity of handling named metrics on top of processing orca report is not that high.
Today in Envoy, simple load balancing decisions can be made by taking into account local or global knowledge of a backend’s load, for example CPU. More sophisticated load balancing decisions are possible with application specific knowledge, e.g. queue depth, or by combining multiple metrics.
This is useful for services that may be resource constrained along multiple dimensions (e.g. both CPU and memory may become bottlenecks, depending on the applied load and execution environment, it’s not possible to tell which upfront) and where these dimensions do not slot within predefined categories (e.g. the resource may be “number of free threads in a pool”, disk IOPS, etc.).
https://docs.google.com/document/d/1NSnK3346BkBo1JUU3I9I5NYYnaJZQPt8_Z_XCBCI3uA/edit# provides a design proposal for an Open Request Cost Aggregation (ORCA) standard for conveying this information between proxies like Envoy and upstreams. We propose that this become a standard part of UDPA and supported by Envoy.
The design document is in draft stage; from offline discussions I think the need for something like this is not very controversial, we can iterate on aspects of the design here.