Open Request Cost Aggregation (ORCA)

htuch commented 5 years ago

Today in Envoy, simple load balancing decisions can be made by taking into account local or global knowledge of a backend’s load, for example CPU. More sophisticated load balancing decisions are possible with application specific knowledge, e.g. queue depth, or by combining multiple metrics.

This is useful for services that may be resource constrained along multiple dimensions (e.g. both CPU and memory may become bottlenecks, depending on the applied load and execution environment, it’s not possible to tell which upfront) and where these dimensions do not slot within predefined categories (e.g. the resource may be “number of free threads in a pool”, disk IOPS, etc.).

https://docs.google.com/document/d/1NSnK3346BkBo1JUU3I9I5NYYnaJZQPt8_Z_XCBCI3uA/edit# provides a design proposal for an Open Request Cost Aggregation (ORCA) standard for conveying this information between proxies like Envoy and upstreams. We propose that this become a standard part of UDPA and supported by Envoy.

The design document is in draft stage; from offline discussions I think the need for something like this is not very controversial, we can iterate on aspects of the design here.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

Mythra commented 5 years ago

I've been talking with @htuch about implementing ORCA's in-band reporting, and adding it's details to a particular stream.)

However, when bringing it up @htuch mentioned that the RFC had some debate around how inband reporting should be implemented. Currently the RFC Calls for stuffing JSON in x-endpoint-load-metrics. However after talking with @PiotrSikora I think this is the incorrect choice, and would like to spur extra conversation here. Since I shouldn't be the sole person to make this decision :stuck_out_tongue_winking_eye:

While the JSON Reporter is using a standard encoding (JSON), not all proxies currently support JSON out of the box, nor should they. Most of the time the bytes are just moving through them, and they only need to know how to parse HTTP. (Some can be extended: see NGINX, but those are always custom extensions/code written to do so).

If we want ORCA to become a true standard, by lowering the barrier to entry for them by not forcing them to add in JSON parsing if they don't need to, adoption would increase. Which would help the total number of users, and it's usefulness.

Instead I recommend we implement both parsing of: x-endpoint-load-metrics-bin (binary protobuf format), and: x-endpoint-load-metrics. The real one we should focus on is: x-endpoint-load-metrics (which maybe should even be called endpoint-load-metrics since the IETF recommends not using x-). The reason for this is two fold:

It's an RFC that's very close to completing, and already has other RFCs building on top of it. So it's chances of "dieing out" are low.
1. Even headers that are widely supported today are using something very close to parameter list: Cache-Control: max-age=seconds for example. This is already the basis for parameter list.

x-endpoint-load-metrics-bin I think should be supported because it can be a nice optimization for those already integrating with ORCA protobuf, and wanting to not have to juggle two seperate encodings of what to send on.I don't imagine it being a huge ask to do so, making it not worth the implementation effort.

Happy to hear thoughts on this, and coming up with something official that aren't just my bemused thoughts :smile:

htuch commented 5 years ago

@SecurityInsanity yes, https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-11 is fine. Let's just make sure that this is a direct translation of the data model that exists in the proto, i.e. it should be equivalently expressive.

Mythra commented 5 years ago

For sure 👍🏻. I think keeping the model is needed.

I’ll start working on a PR tomorrow. Can you update the doc? (I don’t think I have write access).

htuch commented 5 years ago

@SecurityInsanity sure. I think we should hash out the new representation here first. Looking at https://github.com/envoyproxy/envoy/blob/master/api/udpa/data/orca/v1/orca_load_report.proto, I think we may need multiple headers, e.g. x-endpoints-load-metrics, x-endpoints-load-metrics-cost, x-endpoint-load-metrics-utilization to correctly distinguish the core fields and the distinct maps that exist in the data model. @PiotrSikora do you think this is correct?

FWIW, coincidentally I'm working on migrating the API tree in https://github.com/envoyproxy/envoy/tree/master/api/udpa (which includes the ORCA protos) to live in https://github.com/cncf/udpa-wg today. This shouldn't have a major impact on your work, but there might need to be some slight path or Bazel fixups needed once this lands.

Mythra commented 5 years ago

@htuch That's good to know about the proto's moving, thanks. Based on my understanding of the priority yes we'd need 3 headers. (one for the core type, and two for the two maps).

Mythra commented 5 years ago

After talking over this a bit the current plan for implementing ORCA is:

Create a series of stats for "static" ORCA metrics. (cpu_utilization, mem_utilization, and rps).
Give each worker thread a new thread local map for custom app metrics (that is unsynchronized):
- map<custom_metric_key, pair<total_count, avg>>
When LRS starts it's run it:
- Grabs the series of static metrics from stats.
- Grab a copy of the the metric keys, and average the averages.
Build the full response, and send that out.

There are a couple notes here:

The stats, and local maps won't be fully synchronized.
- This is considered a worthwhile tradeoff since the alternative is to take a lock in worker threads which would be more unideal.
An average of an average may be less precise than just a normal average.

I'm posting here incase anyone has any commments/questions/concerns.

htuch commented 5 years ago

@SecurityInsanity sounds like a plan from my side. Looking forwarding to seeing ORCA support landing :)

htuch commented 5 years ago

@SecurityInsanity assigning issue to you for the ORCA implementation work planned. Feel free to assign back if there is any remaining future work once that lands.

CodingSinger commented 4 years ago

Hello everyone, I have a question, is there any difference between orca_load_report and the original LRS? My understanding is that orca_load_report is the backend server passing load information to envoy, and LRS is passing information between envoy and management server？

Mythra commented 4 years ago

Hey @CodingSinger,

ORCA for now is actually going to be integrated into the LRS when it is implemented. It will provide a richer set of information.

Right now the LRS only provides load info about the number of requests, who it’s routing to and when. ORCA compliments that info by allowing services to report how much a request cost. For example a service can say “processing this request I took up 20% cpu”.

There are two ways a service can report this back to envoy:

Through headers in the response.
Through a separate out of band reporting mechanism.

We’re targeting reading response headers first. Admittedly I’ve had a lot going on so this has slumped, however I hope to have something up in the coming weeks.

CodingSinger commented 4 years ago

@SecurityInsanity Thanks for your reply. But I think now it seems to be divided into LoadReportingService andOrcaLoadReport. According to your reply, are both ORCA and LRS acting between the backend server and envoy? But I found in the comment in LoadReportingService that

 // Independently, Envoy will initiate a StreamLoadStats bidi stream with a
   // management serve

Mythra commented 4 years ago

@CodingSinger ,

ORCA metrics will be added to the LoadReportingService (not replacing) stats since we believe it is useful there as well, but the actual ORCA stats are being sent between the thing envoy is sending requests to and envoy.

CodingSinger commented 4 years ago

@SecurityInsanity Thanks. I have got it.

erikbos commented 4 years ago

Does anybody have any pointers to blogs / papers about about considerations for multi region or global load balancing algorithms? (Useful for input into a designing a system which would leverage functionality like ORCA)

htuch commented 4 years ago

That's a great question @erikbos. @alexburnos @antoniovicente are you folks aware of any public material that talks to how backend named costs would integrate with global LB?

alexburnos commented 4 years ago

Do know anything public that would be specifically focused on LB algorithms, but maybe chapter on managing load in the SRE book could give some high level ideas.

erikbos commented 4 years ago

Thanks for the reference to the SRE book, it's always a good read but I was looking for the next level of depth.. On Slack @snowp mentioned https://netflixtechblog.com/netflix-edge-load-balancing-695308b5548c which contains some of that 👍

holooooo commented 1 year ago

It is an amazing feature.Is there any news? :call_me_hand:

htuch commented 1 year ago

gRPC has adopted ORCA (and its xDS definitions) as the basis of load reporting for gRPC-LB v2 CC @markdroth. We still do not have any Envoy implementation though, very much open to any contribution PRs here.

markdroth commented 5 months ago

Just for reference for anyone working on this, the ORCA support in gRPC is documented in gRFC A51: Custom Backend Metrics Support and gRFC A64: xDS LRS Custom Metrics Support.

See also gRFC A58: weighted_round_robin LB policy for how ORCA is used in load balancing.

soulxu commented 4 months ago

@Mythra are you still working on this? If not, I'm a little interesting in this issue.

htuch commented 4 months ago

@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez

osswangxining commented 4 months ago

@efimki is working on this from our side (Google). CC @markdroth @AndresGuedez

Any detail info about this? Looking forward to this feature. Any we can help this?

efimki commented 4 months ago

Here is a draft outline of what we are trying to do:

Load Reports will be provided to the xDS control plane server via xDS LRS API.
Load reports will be used by a new Client Side Weighted Round Robin load balancing policy to dynamically calculate host weights on the client side. Inline reporting enables sub second load balancing reaction times (depending on backend load measurement and reporting intervals), a critical requirement for customers with coordinated and spiky traffic workloads.
Using these load reports, Envoy proxies will be able to implement load balancing policies that vary endpoint load balancing weights according to backend load reports.

More details are here.

wbpcode commented 3 months ago

Basically, I think there are two different part works:

bridge the orca report and LRS.
make the lb aware the orca.

I personally think we should only provide simplest support to the common metrics: cpu, mem, application_utilization, etc. first. These attributes could cover most cases.

The named_metrics, utilization, and request_cost may has more complex semantics and will bring more heavy overhead. So, I will prefer to only provide simplest implementation first until our users ask more.

efimki commented 3 months ago

I agree with a two parts distinction.

We will start with using orca report for LRS. I agree that common orca metrics cover many cases, however we want to provide our users with flexibility of using named metrics if necessary. The additional complexity of handling named metrics on top of processing orca report is not that high.

envoyproxy / envoy

Open Request Cost Aggregation (ORCA) #6614