aws / aws-application-networking-k8s

A Kubernetes controller for Amazon VPC Lattice
https://www.gateway-api-controller.eks.aws.dev/
Apache License 2.0
171 stars 49 forks source link

Multiple HTTPRoutes in same namespace results in single Lattice service (due to name truncation) #657

Open HannesBBR opened 3 months ago

HannesBBR commented 3 months ago

Hi,

We have some applications that have multiple k8s Service resources, for which traffic is handled differently. To onboard these services into the Lattice service network, we now want to create a HttpRoute for each of the services. However, in some cases this leads to only 1 Lattice service to be created.

I suspect the reason is that the name of the Lattice services is limited to 38 characters by truncating the HttpRoute name (20 chars) and namespace name (18 chars): https://github.com/aws/aws-application-networking-k8s/blob/15d0899bb4ccaf4327ab3cac68d058d7d159b39e/pkg/utils/common.go#L64

The result is that the controller tries to create multiple Lattice services with the same name (as the truncation leads to duplicates), and thus only one succeeds. Is there a specific reason to limit the name to 38 characters, instead of the limit imposed by Lattice (63 chars)?

The workaround of course is to change the HttpRoute names ourselves to prevent the truncate issue, but this leads to inconsistencies with the other resources in the namespace, so ideally this would not be needed. The current naming also leads to some 'weird' service names, as often the namespace and HttpRoute name will be the same, leading to a Lattice service name of abc-service-abc-service.

Would it be possible to increase the limit of the name and/or allow us to define the name of the lattice service ourselves?

Thanks for any insights!

erikfuller commented 3 months ago

Hi @HannesBBR,

I believe the Lattice limit for service name length is 40, though I agree the current sevice naming scheme in the controller isn't ideal for the reasons you've described.

The tags on the Lattice service contain the full route namespace and name, so we could potentially offer an alternate (random or semi-random) naming scheme independent of the actual route name, then rely on tags for identifying the Lattice service for the Route. This would likely be the most robust option. Having said that, a new blanket naming scheme may also introduce complexity for upgrades or migrations, since the old naming scheme would likely need to coexist side-by-side. Probably some things we could do here to make life easier but requires some thought.

Alternatively, an annotation on the HttpRoute to use a specific name as an "override" might be an easier short-term option, but might get tedious to operate.

Any thoughts on these two options? Given your current configuration, what would ideal look like here?

HannesBBR commented 3 months ago

Hi Erik,

You're right on the service name limit indeed, I was looking at the documentation https://docs.aws.amazon.com/vpc-lattice/latest/ug/services.html where it in step 1 mentions 63 characters, so that might need to be updated. Just tested to make sure, and it indeed looks like the API restricts it to 40 chars.

I think your assessment is correct in that any change there will introduce some complexity to keep the existing services running, as recreating them with a new name would introduce new Lattice domains, requiring all clients and/or DNS to be updated.

On the first option, I think I'd prefer semi-random instead of fully random, as the service name also influences the Lattice domain that is created, and having the actual name of our application as part of the lattice domain/service name is handy in a service network of 50+ applications. Fully relying on the tags might also not be great as I believe these aren't visible to other members of the service network unless a service is explicitly shared via RAM? While if the application name is still part of the lattice service name, you don't need to explicitly share the service, and you'd still know which services are onboarded through the 'service network assocations' tab in the console, without needing to access the tags of the service.

One issue I see there though is that having semi-randomness in the name does eat into the limit of the 40 chars, which might make it harder for a proper/readable name to be constructed? So in that sense I think I'd prefer the second option where application owners are able to control/override the name of the Lattice service.