Closed benjamin-awd closed 6 months ago
This is great! Thanks for the contribution. Can you confirm that this addresses https://github.com/ArroyoSystems/arroyo/issues/621?
It's looking a lot better so far on our end, but will keep an eye on it -- I think our remaining issues are due to how the NATS connector handles checkpointing
Resolves https://github.com/ArroyoSystems/arroyo/issues/621
This PR:
construct_gcs
function to async, which should hopefully improve performanceCurrently, every checkpoint instantiates a ObjectStore leading to a large number of calls to the metadata server. This can lead to a high number of concurrent DNS lookups, which may cause network latency and other undesirable effects. While the GCE metadata service endpoint has no official rate limit, we should avoid making unnecessary calls to it.
Took reference from Polars on this: https://github.com/pola-rs/polars/issues/14384#issuecomment-1948991697, https://github.com/pola-rs/polars/blob/main/crates/polars-io/src/cloud/object_store_setup.rs#L4
Note: the AWS metadata endpoint has a rate limit of 1024 packets per second, so it might be worth implementing this for the
construct_s3
function as well at some point.