aws / aws-dotnet-extensions-configuration

This repository hosts various libraries that help developers configure .NET applications using AWS services.
https://aws.amazon.com/developer/language/net/
Apache License 2.0
178 stars 56 forks source link

DuplicateParameterException can break a production application #184

Closed tysonstewart closed 2 months ago

tysonstewart commented 2 months ago

Describe the bug

The extension throws a DuplicateParameterException when a "duplicate" parameter is detected because the keys are case-insensitive in code but case-sensitive in Parameter Store. This means that an application that uses this extension on startup as describe in the documentation can be prevented from starting simply by adding a key to Parameter Store--no code changes.

Expected Behavior

The extension should gracefully handle a duplicate key, probably by letting the last value win since .NET configuration works that way anyway.

Current Behavior

The extension throws an exception. Handling the exception isn't really an option since the rest of the configuration won't be available.

Reproduction Steps

  1. Add a value to Parameter Store, e.g. /myApp/myKey
  2. Add the prefix to the configuration using the application extension method:
    builder.Configuration.AddSystemsManager("/myApp");
  3. Optionally start the app to show that it works
  4. Add a new value to Parameter Store having the same key as step 1 but with different casing, e.g. /myApp/mykey
  5. Restart the app and note that it throws a DuplicateParameterException on startup

Possible Solution

I don't consider the addition of a configuration parameter to be an action that could cause an application to fail to start. While the conflict in keys is something I would definitely want to know about, I don't want the application to not start at least in production. My suggested behavior is allowing the last value to win, though perhaps the exact behavior could be configurable itself.

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

Amazon.Extensions.Configuration.SystemsManager 6.2.0

Targeted .NET Platform

.NET Core 7.0

Operating System and version

N/A

ashishdhingra commented 2 months ago

@tysonstewart It's the expected behavior since .NET Framework would not allows duplicate keys (it is case insensitive). We should not be picking up last value since it could lead to unpredictable results (please refer https://github.com/aws/aws-dotnet-extensions-configuration/issues/146#issuecomment-1540525422 for more details).

CC @normj for any additional inputs.

tysonstewart commented 2 months ago

I understand that argument, but there's an important difference between loading configuration from a JSON file and loading configuration from Parameter Store: the file is almost certainly static and deployed with the application while Parameter Store is a live system subject to change during the application's lifetime. Furthermore, because https://github.com/aws/aws-dotnet-extensions-configuration/pull/145 was closed, as developers, we don't have any way to mitigate this scenario. I can catch the exception but none of the configuration from Parameter Store is available now.

Perhaps I'm not using a recommended pattern, but my implementation uses the environment as part of the path, e.g. /MyApp/Staging/Key and /MyApp/Production/Key which means that it's possible to not catch a misconfiguration in a lower environment. I don't believe "be careful with what keys you add to Parameter Store" is a good answer here. I need a way to make it safe to make a mistake.

normj commented 2 months ago

@tysonstewart If you add the configuration as separate calls for staging and production are you seeing the duplicate exception? Something like this:

builder.Configuration.AddSystemsManager("/myApp/Staging");
builder.Configuration.AddSystemsManager("/myApp/Production");

Within a individual configuration source I feel pretty strongly that we should throw the duplicate exception because the order of what is last is not deterministic and applications could get surprising behavior that changes over time. As separate configurations sources I agree we should not throw a duplicate exception because in that case the order is deterministic based on the order of calls to add configuration sources. I think .NET itself handles later configuration sources overriding the previous sources so I'm not sure there is anything the library has to do to make that happen.

tysonstewart commented 2 months ago

We don't load it like that, but rather like this:

builder.Configuration.AddSystemsManager($"/myAppGlobal/{builder.Environment.EnvironmentName/");
builder.Configuration.AddSystemsManager($"/myApp/{builder.Environment.EnvironmentName/");

So only an environment's configuration is loaded. The trouble we ran into is that a dev was testing something in Staging and added duplicate parameters in /myApp/Stage, both brand new, without noticing that there was a single character with different casing. And even though the values weren't being used, when the stage application next started, it threw an exception and failed to start. This was a mystery to the responders because no code had changed.

Fortunately, this happened to us in the stage environment but it's terrifying that making a simple mistake like this could hit us in production and it wouldn't show up until a deployment and for a reason completely unrelated to changes being deployed.

normj commented 2 months ago

@tysonstewart In your use case if we didn't have the duplicate exception and when the dev inadvertently added the duplicate there would be no control which duplicate would get picked. So instead of a crashing behavior that brought the issue to your attention right away you could have a runtime bug that was possibly using the wrong database connection string and corrupting data or sending production data to staging. There are times when picking the last would win but the there would be terrible runtime times when the wrong duplicate was picked.

tysonstewart commented 2 months ago

Right, and I understand that, but again, in this specific case it wasn't even being used--no code was consuming the values that were created. Nonetheless it prevented the application from starting.

I'm not arguing why you added this safeguard. I'm saying that in my case, there is no danger of corrupting data or sending it to the wrong place. Maybe an API key is incorrect and that particular service is down or uses the wrong account, but that's preferable to the entire application being offline while Ops scrambles to figure out what just happened.

normj commented 2 months ago

In your case the duplicate config value wasn't used so it wouldn't be problem but this library has no idea what values are going to be used or not. We could add an opt-in flag to ignore duplicate and use one of the values at random but who would use that flag?

I'm assuming some of the config values being pulled are being used and it would have been bad if the wrong value was used. So if the dev caused a duplicate for a critical config setting and you turn that flag on your system could get corrupted. Would you really leave yourself open for that type of corruption by turning this opt-in flag on?

tysonstewart commented 2 months ago

Respectfully, you're arguing the position that the chance of a developer or operator overriding an existing value with something that could cause corruption is higher or worse than a dev adding two unused values that causes production downtime. For us, the former is extremely unlikely and the latter is our livelihood. So to answer your question, directly: yes, I would.

tysonstewart commented 2 months ago

I would also add that the scenario you're concerned with is still possible even with the duplicate key exception. In the example I added above, I'm adding two configurations with the idea that the second overwrites the first one so that a global value can be overwritten by an application-specific value. Someone can add a bad value to the app-specific and wind up in the same scenario. Or really, if we boil it down, they could just put a bad value in the original configuration. Such is the risk of configuration.

tysonstewart commented 2 months ago

For posterity for anyone in the same situation as me in the future, here's a custom parameter processor that does the same thing as the default one but takes an action rather than throwing:

public class SafeParameterProcessor : DefaultParameterProcessor
{
    private readonly Action<string> _onDuplicateKeyError;

    public SafeParameterProcessor(Action<string> onDuplicateKeyError)
    {
        _onDuplicateKeyError = onDuplicateKeyError;
    }

    private IEnumerable<KeyValuePair<string, string>> ParseStringList(
        Parameter parameter,
        string path
    )
    {
        // Items in a StringList must be separated by a comma (,).
        // You can't use other punctuation or special characters to escape items in the list.
        // If you have a parameter value that requires a comma, then use the String type.
        // https://docs.aws.amazon.com/systems-manager/latest/userguide/param-create-cli.html#param-create-cli-stringlist
        return parameter
            .Value.Split(',')
            .Select(
                (value, idx) =>
                    new KeyValuePair<string, string>($"{GetKey(parameter, path)}:{idx}", value)
            );
    }

    public override IDictionary<string, string> ProcessParameters(
        IEnumerable<Parameter> parameters,
        string path
    )
    {
        var result = new List<KeyValuePair<string, string>>();
        foreach (var parameter in parameters.Where(parameter => IncludeParameter(parameter, path)))
        {
            if (parameter.Type == ParameterType.StringList)
            {
                var parameterList = ParseStringList(parameter, path);

                // Check for duplicate parameter keys.
                var stringListKeys = parameterList.Select(p => p.Key);
                var duplicateKeys = result
                    .Where(r => stringListKeys.Contains(r.Key, StringComparer.OrdinalIgnoreCase))
                    .Select(r => r.Key);
                if (duplicateKeys.Any())
                {
                    _onDuplicateKeyError(parameter.Name);
                }
                else
                {
                    result.AddRange(parameterList);
                }
            }
            else
            {
                var parameterKey = GetKey(parameter, path);

                // Check for duplicate parameter key.
                if (
                    result.Any(r =>
                        string.Equals(r.Key, parameterKey, StringComparison.OrdinalIgnoreCase)
                    )
                )
                {
                    _onDuplicateKeyError(parameter.Name);
                }
                else
                {
                    result.Add(
                        new KeyValuePair<string, string>(parameterKey, GetValue(parameter, path))
                    );
                }
            }
        }

        return result.ToDictionary(
            parameter => parameter.Key,
            parameter => parameter.Value,
            StringComparer.OrdinalIgnoreCase
        );
    }
}

This takes the first value it comes across and returns that in the configuration, but it would be easy enough to change that behavior. You could, for example, just remove the value altogether, but in any case you get control over what happens.

The major disadvantage, of course, is similar to forking the repo: you'll have to monitor and pull in upstream changes yourself.

normj commented 2 months ago

@tysonstewart In your example above there is an order given by you with how configuration values should be applied. First global and then app specific. That is deterministic and follows the pattern of having later configuration sources overwrite previous configuration sources.

Duplicates within the same configuration have no order to decide the precedence. Imagine you are running across 2 EC2 instances, there is no rule that the SSM API will return the data in the same order and one instance could pick a different value then another instance. Then you will have a terrible time understanding why one instances is doing the expected and another is corrupting data. Or maybe just having them using different values across the instances causes corruption. What if the parameter is an encryption key and the 2 different instances are encrypted production data with different encryption keys.

I understand in your case you would have gotten lucky and there was no serious issue but there are so many ways that could have gone terribly wrong by ignoring the duplicate and essentially choosing a random value for the duplicate.

In your code how would Action<string> onDuplicateKeyError choose which value to use? There isn't enough information to make an intelligent decision.

If the root of your request is really when you call AddSystemsManager that you want to give a list of the expected keys and all other keys should be ignored and not loaded in the configuration source I can be convinced of that. That way if somebody adds unrelated bad data to your prefix in parameter store it would be ignored.

tysonstewart commented 2 months ago

@normj I understand what you're saying. I fully acknowledge that two values with different casing are not deterministic in their order. Further, I acknowledge that for some use cases, this is a show-stopper.

However, in my case, it is not a critical error and the code as it stands takes that decision out of my hands. It determines for me that such a condition is worse than production downtime or forces me into a position where I have to accommodate it rather than the other way around.

Throwing an exception in this condition is certainly a sensible default but I don't think it's a reasonable mandate since everyone's circumstances are not identical.

normj commented 2 months ago

If you truly don't care you set the optional parameter when you call AddSystemsManager. I view this as no different if you added duplicate values to your appsettings.json file. It is an invalid JSON document and the application will throw an exception at startup unless the configuration source was added with the optional.

tysonstewart commented 2 months ago

Yeah but it's the same catching the exception myself at the call of .AddSystemsManager()--the app doesn't get any of the configuration at all in that case. And again, in the case that caused our discovery of this, there was no error in consumed configuration. There was only a mistake by a dev creating two totally unused values in the namespace.

I can see that you aren't interested in budging on this. I'm going to ship the code I posted above, log the error condition, and alert on it.

github-actions[bot] commented 2 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.