go-spatial / tegola

Tegola is a Mapbox Vector Tile server written in Go
http://tegola.io/
MIT License
1.25k stars 192 forks source link

Support for S3 Accelerated Bucket Endpoint in Cache Configuration #991

Closed AISuhasDattatreya closed 1 month ago

AISuhasDattatreya commented 1 month ago

We are currently using Tegola and would like to leverage S3 Transfer Acceleration for our cache. However, the current documentation and configuration options do not seem to support specifying a custom S3 accelerated endpoint.

Could you please add support for specifying a custom S3 endpoint in the cache configuration? This feature would greatly enhance our caching performance.

iwpnd commented 1 month ago

This should suffice, does it not?

AISuhasDattatreya commented 1 month ago

Thank you for your response. Could you clairify where we'd add the accelerated endpoint? Presently, we've added the s3 cache configuration on the toml and enabled transfer acceleration on s3. But when on the browser network we're still seeing cache being served from the non-accelerated endpoint.

[cache]
type = "s3"
bucket = "${AWS_CACHE_BUCKET}"
region = "${AWS_REGION_1}"
aws_access_key_id = "${AWS_ACCESS_KEY_ID_1}"
aws_secret_access_key = "${AWS_SECRET_ACCESS_KEY_1}"
iwpnd commented 1 month ago

Disclaimer, I’ve not worked with accelerators, so it’s trial and error. But you could try to add it in endpoint. If this does not work, I’m happy to investigate this. 👍

ARolek commented 1 month ago

I read up on the accelerators last night and it seems like you just need to enable the transfer accelleration capability on the bucket and then adjust the s3 endpoint to use. Here are the docs: https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration-getting-started.html

@AISuhasDattatreya your [cache] block could be adjusted to the following:

[cache]
type = "s3"
bucket = "${AWS_CACHE_BUCKET}"
region = "${AWS_REGION_1}"
# this will need to be an env var, but giving you an example of the format
# note, we might need to use just "s3-accelerate.amazonaws.com" without the bucket name here, this 
# will be a bit of testing to prove out. 
endpoint = "bucketname.s3-accelerate.amazonaws.com"
aws_access_key_id = "${AWS_ACCESS_KEY_ID_1}"
aws_secret_access_key = "${AWS_SECRET_ACCESS_KEY_1}"

Side note, you could probably remove aws_access_key_id and aws_secret_access_key from your config. The AWS Go SDK will infer this from your environment so your infrastructure can manage access control. It's considered more secure to do it this way.

AISuhasDattatreya commented 1 month ago

This works very well, thanks! I'll raise a PR to update the documentation

iwpnd commented 1 month ago

Nice, thank you for the feedback. 👍

ARolek commented 1 month ago

Glad that worked!

AiNikhilPatil commented 1 month ago

@iwpnd @ARolek , We have updated the accelerated endpoint changes in config.toml and deployed them. Is there any way we can verify this deployment? It seems that the cache received on the platform is being retrieved by the same Tegola endpoint from which the request was raised.

For reference, a sample .pbf URL looks something like this: https://.execute-api.eu-west-2.amazonaws.com/prod/maps/ukpn//11/1023/681.pbf.

Is there a method to verify this?

iwpnd commented 1 month ago

It seems that the cache received on the platform is being retrieved by the same Tegola endpoint from which the request was raised.

I'm afraid I can't follow you. You want to validate that the correct endpoint is being used? Should there not be metrics in your AWS console to verify those requests?

ARolek commented 1 month ago

@AiNikhilPatil I would need to know a bit more about your architecture. Based on your comment it seems like tegola is the proxy between the request and the cache layer, so tegola is responsible for routing all requests. Based on this, I think the s3transfer acceleration will give a bit of a bump in performance, but the location of your tegola endpoint will also play into latency.

Another way to architect this is to use a CDN (i.e. Cloudfront) in front of tegola and have it attempt to read from the S3 cache directly before routing to tegola. This approach would probably benefit a lot more from s3 transfer acceleration. I wrote a post about this design awhile back: https://medium.com/@alexrolek/the-serverless-vector-map-stack-lives-22835d341d7d. I think this might be a better fit if you're looking at faster serving from the cache.