chaostoolkit / chaostoolkit-lib

The Chaos Toolkit core library
https://chaostoolkit.org/
Apache License 2.0
77 stars 46 forks source link

Fix dynamic config and env type config when using them in extra_vars #252

Closed roeiK-wix closed 2 years ago

roeiK-wix commented 2 years ago

Describe the bug I found that when I'm trying to use the dynamic config or configuration with type env, it doesn't work when using extra_vars files.

Runtime versions Please run: 1.272

$ python --version
Python 3.8.13
$ chaos info core
NAME                VERSION   
CLI                 1.12.0    
Core library        1.25.0    
$ chaos info extensions
NAME                                    VERSION   LICENSE                       DESCRIPTION                                       
chaostoolkit-addons                     0.2.1.dev4+ge555e73UNKNOWN                       Addons for your Chaos Toolkit experiments    

To Reproduce

$ chaos run --var-file=<some_path>

Expected behavior I want to use dynamic config in the extra vars files or to use type env configuration in the extra_vars.

Additional context So let's say I have the following example in the extra_vars file:

{
  "configuration": {
    "some_config_1": {
      "key": "CONF1",
      "type": "env"
    },
    "some_config_2": {
      "name": "some config 2",
      "type": "probe",
      "provider": {
        "type": "python",
        "module": "src.probes",
        "func": "some config 2",
        "arguments": {
          "arg": "${some_config_1}"
        }
      }
    }
  }
}

First of all, some_config_1 and some_config_2 are stays the same (raw data, before setting it as static config). Second, after fixing the first issue in my local chaostoolkit-lib repo in the load_configuration function, I found another issue in load_dynamic_configuration function that some_config_2 stays the same (as raw data).

I found a fix for those two issues: - First fix: (load_configuration)

     logger.debug("Loading configuration...")
    env = os.environ
    extra_vars = extra_vars or {}
    conf = {}

    for (key, value) in config_info.items():
# ----------------FIX----------------
        if extra_vars.get(key): 
            value = extra_vars[key]
            del extra_vars[key]
# ------------------------------------
        if isinstance(value, dict) and "type" in value:
            if value["type"] == "env":
                env_key = value["key"]
                env_default = value.get("default")
                if (
                    (env_key not in env)
                    and (env_default is None)
                    and (key not in extra_vars)
                ):
                    raise InvalidExperiment(
                        "Configuration makes reference to an environment key"
                        " that does not exist: {}".format(env_key)
                    )
                env_var_type = value.get("env_var_type")
                env_var_value = convert_to_type(
                    env_var_type, env.get(env_key, env_default)
                )
                conf[key] = extra_vars.get(key, env_var_value)
            else:
                conf[key] = extra_vars.get(key, value)

        else:
            conf[key] = extra_vars.get(key, value)

    return conf

- Second fix: (load_dynamic_configuration)

    conf = {}
    secrets = secrets or {}

    had_errors = False
    logger.debug("Loading dynamic configuration...")
    for (key, value) in config.items():
        if not (isinstance(value, dict) and value.get("type") == "probe"):
            conf[key] = config.get(key, value)
            continue

        # we have a dynamic config
        name = value.get("name")
        provider_type = value["provider"]["type"]
        value["provider"]["secrets"] = deepcopy(secrets)
        try:
            output = run_activity(value, config, secrets)
        except Exception:
            had_errors = True
            logger.debug(f"Failed to load configuration '{name}'", exc_info=True)
            continue
        if provider_type == "python":
            conf[key] = config[key] = output  # <---------------FIX-----------
        elif provider_type == "process":
            if output["status"] != 0:
                had_errors = True
                logger.debug(
                    f"Failed to load configuration dynamically "
                    f"from probe '{name}': {output['stderr']}"
                )
            else:
                conf[key] = output.get("stdout", "").strip()
        elif provider_type == "http":
            conf[key] = output.get("body")

    if had_errors:
        logger.warning(
            "Some of the dynamic configuration failed to be loaded."
            "Please review the log file for understanding what happened."
        )

    return conf
Lawouach commented 2 years ago

I am trying to understand this issue. But I'm not sure what is the probelm in the first case.

Can you show me an example of what you have and what you expect please?

Lawouach commented 2 years ago

I tghink we are really turning the config loading into a big complex system. I wonder if you shouldn't be having a process out of band that generates the right config before the experiment because that is becoming really hard to understand the code and therefore maintain it. It's almost getting circular.

I need to ponder this.

Lawouach commented 2 years ago

My understanding is that:

  1. you want the merge file to be an external config section when it's actually literal config values. It is not loaded as a config file but as-is. It's not a big because it wasn't never intended to be an import/include mechanism.
  2. The second problem is that you expect substitution to happen at loading time when it happens at running activity time. This is tricky because we could substitute early on (at loading times) but that's a change that is worrying in terms of impact. Substitution is not specified and I wonder if people are betting on the current implementation.

I think we probably could come up with a different approach, we may need a new command line subcommand chaos config generate? that could be much smarter and generate the right end result we expect.

Lawouach commented 2 years ago

What works as-is:

experiment.json contains this block:

{
     "configuration": {
          "some_config_1": {
      "key": "CONF1",
      "type": "env"
    },
     }
}

then your env_var file can contain:

{
"configuration": {
"some_config_2": {
      "name": "some config 2",
      "type": "probe",
      "provider": {
        "type": "python",
        "module": "src.probes",
        "func": "some config 2",
        "arguments": {
          "arg": "${some_config_1}"
        }
      }
    }
}

This will do what you want: dynamic loading on an extra var file.

Lawouach commented 2 years ago

To foster a larger discussion on configuration https://github.com/chaostoolkit/chaostoolkit/discussions/261

roeiK-wix commented 2 years ago

I am trying to understand this issue. But I'm not sure what is the probelm in the first case.

Can you show me an example of what you have and what you expect please?

For example: I have two extra_vars files - sandbox_configurations.json and prod_configurations.json (different configurations with the same keys) sandbox_configurations.json

{
  "configuration": {
    "port": "3000"
    "url": "http://localhost:3000/"
}

prod_configurations.json

{
  "configuration": {
    "port": {
      "key": "PORT",
      "type": "env"
    },
    "url": {
      "name": "Get the url accordion to the port",
      "type": "probe",
      "provider": {
        "type": "python",
        "module": "src.probes",
        "func": "get_url",
        "arguments": {
          "port": "${port}"
        }
      }
    }
  }
}

exp.json

{
  "configuration": {
    "port":"",
    "url":"",
    "some_other_dynamic_config": {
      "name": "Some other dynamic config",
      "type": "probe",
      "provider": {
        "type": "python",
        "module": "src.probes",
        "func": "some_probe",
        "arguments": {
          "port": "${port}",
          "url": "${url}"
        }
      }
    }
  }
}

So the expected result is: If for example, we run chaos run --var-file=sandbox_configurations.json we expect to have

    "port": "3000"
    "url": "http://localhost:3000/"

And if we run chaos run --var-file=prod_configurations.json we expect to have

    "port": "8080"
    "url": "http://some_production_url:8080/"

So the expected from the prod configuration is to be merged to the exp configuration and then execute the dynamic config and set it as static, same for the type: env config.

roeiK-wix commented 2 years ago

I tghink we are really turning the config loading into a big complex system. I wonder if you shouldn't be having a process out of band that generates the right config before the experiment because that is becoming really hard to understand the code and therefore maintain it. It's almost getting circular.

I need to ponder this.

I agree with you that this is not an elegant solution and perhaps we should think of a better solution, but the logic we want is not a bad thing for all of the users, because if you have a way of getting the configuration as an env variable, and you have a way of getting the configuration as a dynamic config (on run time) and you have a way of getting the configuration from some other files at run time, why won't you want to be able to get all of those configurations when you combine them together?

What we want is just to have the ability to merge the extra_vars file first and then check what type it is and then parsed it and set it as static config.

roeiK-wix commented 2 years ago

Let me simplify it: The first bug that I found is in load_configuration function, the problem is that type: env doesn't work if I put it in the extra_vars file.

The second bug that I found is in load_dynamic_configuration function, the problem is that whenever I have two dynamic_config in the extra_vars only the first one is been set to the conf[key], but in the second iteration (according to the code) we send to run_activity(value, config, secrets) the config and not conf and config is not been updated as well. Therefore the fix we need to do in this function is very small, just don't use conf, and just set directly the config.

I think the two fixes are not so wrong to do, but in any case, you can refactor the load_configuration if it does not look elegant to you.

Lawouach commented 2 years ago

The first bug that I found is in load_configuration function, the problem is that type: env doesn't work if I put it in the extra_vars file.

But this was never designed for this. I don't agree with the bug label. The extra var value is named "extra var" not "external configuration". It should contain values and should be generated out of band of the ctk by you.

The second issue follows as not being a bug because it's not designed to work this way.

I'm not saying it can't be improved but until we clarify that discussion around configuration, I will remain prudent with these small changes that make the entire thing more and more difficult to reason about.

Lawouach commented 2 years ago

We need to have a community discussion https://github.com/chaostoolkit/chaostoolkit/discussions/261

Lawouach commented 2 years ago

What I mean by increasing complexity. The code currently does this:

But, one scenario we can't cover, what if a secrets require the value only available after dynamic config? This is not possible unless we now also have load_dynamic_secrets. But we always end up with a potential use case that goes in circle.

Ideally we should not have the dynamic config as a different function but that should be fully part of the load_configuration. IN that case, we could consider what you want here, that extra vars also contain definitions of config blocks.

So basically, probably the best approach would be to have a single call that performs the loading of everything with a forward/backward approach so we can go to peek the value when needed.

roeiK-wix commented 2 years ago

Yep, I agree completely with the idea of creating a new function for all of them, just get all the configurations and handle all of those cases.

The problem is that we need a quick fix right now, and this may take some time to implement. We are thinking of forking chaostoolkit-lib until this proper fix is implemented.

Lawouach commented 2 years ago

I think it may be a good idea to fork for a short while. I will only have the time to work on this next week :(

Lawouach commented 2 years ago

The goal will be to keep backward compatibility but support more flexible scenarios like this one.

roeiK-wix commented 2 years ago

Sure, tnx for the effort :)

Lawouach commented 2 years ago

I think it shows how much it's needed but I don't want to rush it. I hope to get it right :p

Lawouach commented 2 years ago

Right I'm gonna try to work on this one again

Lawouach commented 2 years ago

Hey @roeiK-wix I have a PR with your first fix. But from the test I have created for it, I can't see the second fix as needed

https://github.com/chaostoolkit/chaostoolkit-lib/pull/254/files