aws / aws-sdk-java-v2

The official AWS SDK for Java - Version 2
Apache License 2.0
2.18k stars 842 forks source link

Regression: Proxy Configuration from Environment Variables breaks custom proxy support #4728

Open abatkin opened 10 months ago

abatkin commented 10 months ago

Describe the bug

We have code to customize all AWS SDK Clients to set proxies appropriately. For example, we can use VPC Endpoints for some services (which means ensuring that a proxy server is not used), whereas for other services, we must use a proxy server. To further complicate matters, for S3, we can use a VPC Endpoint (no-proxy) from some network locations but not from others (because it is a Gateway endpoint). Until recently, this worked fine, but with #4467, this no longer works, since the default behavior is to use environment variables (which are always set, since Java isn't the only language in our environment).

Expected Behavior

The expected behavior would be for the AWS Java SDK to default to not using environment variables for configuring proxies, as anything else is a breaking change.

Current Behavior

The AWS Java SDK now defaults to using environment variables, which overrides our configuration (which explicitly does not set a proxy for some services).

Reproduction Steps

First, set HTTP_PROXY and HTTPS_PROXY with the expectation that they will not be used by Java (since they weren't used by Java up until a few weks ago).

Then, customize an AWS SDK client to explicitly not use a proxy:

ProxyConfiguration proxyConfiguration = ProxyConfiguration.builder()
  .useSystemPropertyValues(false)
  .build(); // This proxy configuration (previously) told the SDK to NOT use a proxy, no matter what
ApacheHttpClient.Builder httpClientBuilder = ...
httpClientBuilder.proxyConfiguration(proxyConfiguration);
...
// construct SDK Client using the httpClientBuilder

When that client is used, notice that it will use the proxy, even though it should not have. Now re-run with an older AWS SDK version and notice that it will not use the proxy.

Possible Solution

Change the default to not use environment variables, and allow users to opt-in.

Additional Information/Context

Note that this is a breaking change. But I think it is reasonable:

AWS Java SDK version used

2.21.23

JDK version used

Any

Operating System and version

Any

joviegas commented 10 months ago

@abatkin Looking at a generic side, the application running in an environment should automatically pick up environment settings for that environment if that ENV is supported. This behavior is in line with the JAVA V1 behaviour

Could you please set 'useEnvironmentVariableValues' to 'false' in order to disable this?"

ProxyConfiguration proxyConfiguration = ProxyConfiguration.builder()
  .useSystemPropertyValues(false)
  .useEnvironmentVariableValues(false)
  .build();

Disabling it by default may inconvenience users relying on environment variables. For instance, in a production scenario, users can effortlessly modify proxies for specific hosts by simply adding/Deleting environment settings to impacted or target host without requiring any code changes.

abatkin commented 10 months ago

I'm well aware of how to fix it (after all, we do the same thing with the system properties). My point is that for anyone that was trying to programmatically disable proxy usage, this change is a regression. Before the change, they would not use a proxy, after the change they would use a proxy.

And since the mechanism being used requires a new Java API, it also means that we can't write code that is forwards and backwards compatible between pre and post change. Our proxy configuration code lives in a shared library that is used by hundreds of applications.

In other words, we are forced to tell our users that they must pin to an old SDK version or update to our new library version together with the new AWS SDK version. Whereas (at least in our environment) we only required new AWS SDK versions when a new (end-user facing) feature was required (and also, people could update their AWS SDK versions at their leisure, without fear of it suddenly breaking).

This is a regression. It certainly makes the behavior more consistent with other programming languages (and maybe even AWS Java SDK V1) but it is still a regression.

abatkin commented 10 months ago

This is also a potential regression for existing applications that are not doing any custom proxy configuration and are relying on VPC Endpoints for security/compliance/cost (or whatever) reasons. Those applications might stop functioning. Or worse, an application that was previously tested and confirmed to be using a VPC Endpoint might suddenly start sending traffic through a proxy server (and out over the internet), without anyone even noticing.

joviegas commented 10 months ago

Thank you for the information, @abatkin. We do understand this is regression, will review this with the team and take appropriate action.

abatkin commented 10 months ago

We use the Apache HTTP Client.

I'm not sure what you mean in the second question. Due to the networking topology (i.e. Interface VPC Endpoints, Gateway VPC Endpoint for S3 and a dedicated proxy server just for S3, plus a "regular" proxy server for other AWS services/general internet traffic) it is impossible to use environment variables (for example, there is no mechanism to specify different proxy servers depending on destination). Therefore, we customize all AWS SDK Clients to configure proxy servers, as well as a few other things (including authenticating using our federated IdP). For ease-of-use we use the SdkBuilder.applyMutation() function and pass in a single "customizer" that can be used across any AWS SDK Client.

Proxy environment variables are mostly managed and pushed out systematically/automatically for all workstations, servers, docker images, etc.

joviegas commented 10 months ago

We talked as a team and we really want honoring the proxy environment variables to be the SDK’s default. That matches a lot of standards, including those of many other AWS SDKs and the AWS SDK for Java 1.x. Are there any other knobs or configuration we could provide in the SDK to make it an easier transition?

abatkin commented 10 months ago

I think the best knob (and the one that would prevent the least amount of surprise for everyone, including us) would be to at least fix #4703 and follow the standard for parsing NO_PROXY (since you can't argue that you want Java to "match a lot of standards" if you deliberately don't do that).

Something else (but a bigger change) would be if there was some way to express "don't use a proxy here" since as of now there is no way to do that (instead the current process consists of not setting a proxy, disabling the use of system properties, and now disabling the use of environment variables). This may not be worth it, unless you think there will be other things in the future that may override the user's intent to not use a proxy.

Thinking out loud: It might have been nice if the "use environment variables" feature could have defaulted to off for a period of time (6 months) before switching to "on". Then we could have a better transition window where we can provide semi-backward-compatible tools to our users. Given the current state of things, this may not be worth it either.

joviegas commented 10 months ago

Thanks @abatkin for the recommendation. Apologies for NO_PROXY bug , somehow this went completely out of my radar. Added the fix for NO_PROXY with https://github.com/aws/aws-sdk-java-v2/pull/4735.