aip-dev / google.aip.dev

API Improvement Proposals. https://aip.dev/
Other
1.07k stars 489 forks source link

Location-bound resource naming #90

Open lukesneeringer opened 5 years ago

lukesneeringer commented 5 years ago

From @jgeewax:

We often see folks put in APIs that have no location in the resource name at all. Sometimes this makes sense (the resource is actually global), but often it's an oversight and teams make the decision without understanding the complete consequences of their choice.

We should clarify when is OK to scope things at a global level, and how that should be done.

Some notes and questions:

  • If you are at the global level, you must ensure that you actually make global availability guarantees to customers. In other words, if your ID space is spread across multiple locations, and us-central1-a is down, you can't guarantee that ID 1234 is available since it could already exist in the location that is down.

  • Some people have used zones/* or regions/*, when they should use locations/*. There are scenarios when it makes sense to use the more specific option, but that is relatively rare and we should list out those specific case.

  • When you have a global resource, should you just leave "locations/*" out of the name? Or use "locations/global" ? Or leave it out with the plan of adding it in later via an additional binding if you decide to support location-bound resources?

  • Anything involving storage of customer data should very likely be in specific locations. This is due to data privacy and homing issues (e.g., data on EU people must be in the EU).

  • Anything involving transfer of significant amounts of data (measured in 1GB+ ?) should be location-bound. This is because data will likely first be uploaded to GCS and then imported into the service. If you don't know where the data is going, customers don't know where to create their bucket to minimize the egress bandwidth costs.

  • One big goal of putting services in specific locations is to minimize the failure domains. That is, if there is an outage in us-central1-a, and all of a customer's VMs and data is in asia-east1-a, it will be very frustrating to this customer who only ever does stuff in Asia to see an outage because something went wrong half-way around the world in Virginia. It's very important that we isolate far-away failures so that customers don't feel as though they're dragged through the muck of any mistake anywhere on the planet.

  • We should also consider latency for these services, particularly in far-away places like Sydney, Australia. If someone spins up VMs in Sydney to serve their customers but we have a "global" service that is deployed in Taiwan to handle traffic in the APAC region, that's actually not all that close to Sydney: Asking folks to serve Sydney traffic out of Taipei (~140ms) is a bit like asking them to serve SF traffic out of London (~150ms) which is obviously not something we'd ask customers to do.

jgeewax commented 5 years ago

We're also looking at the details about naming things at larger-scoped locations (e.g., what does "us" mean? what about "us-east" ? what about "global"?)

There's prior art here for Cloud Spanner and GCS I think.