elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.18k stars 3.5k forks source link

Add support for environment variable injection in logstash plugin configuration #3944

Closed fbaligand closed 8 years ago

fbaligand commented 9 years ago

It would be great to support environment variable injection in logstash configuration, like this :

tcp {
port: "${TCP_PORT}"
}

It would be very useful to have a logstash configuration independent from its environment. And so, have the same logstash configuration among different environments (dev, test, prod, ...)

Numerous frameworks support such a feature like spring or log4j.

jordansissel commented 9 years ago

As a workaround, you can use a template tool (m4, sed, etc) to achieve this. Run it on your config before starting logstash. This is what I've done in the past with good success. Hope this helps :)

On Sunday, September 20, 2015, Fabien Baligand notifications@github.com wrote:

It would be great to support environment variable injection in logstash configuration, like this :

tcp { port: "${TCP_PORT}" }

It would be very useful to have a logstash configuration independent from its environment. And so, have the same logstash configuration among different environments (dev, test, prod, ...)

Numerous frameworks support such a feature like spring or log4j.

— Reply to this email directly or view it on GitHub https://github.com/elastic/logstash/issues/3944.

fbaligand commented 9 years ago

Thanks for your reply @jordansissel. I know this workaround because I have read your answer in some forums or issues ;) But it would be really great to support it dynamically in logstash ! It avoids to generate an instance configuration from a template configuration, each time we start logstash.

cstockton commented 9 years ago

Given the robustness of logstash configuration files I am a bit surprised there is not a cleaner way to do this. All of our logstash instances are containerized within Docker, getting our various containers to talk to eachother is done via Docker's container linking.. they provide you with some environmental variables so your containers can find each other.

That said, even if something like M4 was suitable for configuration preprocessing in a grammar as expressive as logstash's (it isn't) .. it would be difficult (messy at best) to hook the preprocessor into the various container deployment and orchestration systems. Your containers will be expected to be built ahead of time and be environment agnostic. Typically getting all the context they need at runtime from the environment variables.

@fbaligand The solution I have today uses the environment filter, but it is not ideal because it is including all of the environment vars in the message. Maybe you found a better way?

filter {
  ...
  environment {
    add_field_from_env => {
      "MY_ENV_ADDR" => "MY_ENV_PROD_PORT_12285_TCP_ADDR"
      "MY_ENV_PORT" => "MY_ENV_PROD_PORT_12285_TCP_PORT"
    }
  }
}
...
output {
  http {
    codec => "json"
    http_method => "post"
    url => "http://%{MY_ENV_ADDR}:%{MY_ENV_PORT}/..."
  }
}

Is there a way to specify local fields i.e.: not to be included in final output, but used for transit through the pipeline? Or maybe there is a syntax for defining fields to namespace them? There might be a way to do what I am trying to do in the logstash world, just haven't found it.

I did notice in grok there is a more robust declaration grammar for property access, I.E.: %{SYNTAX:SEMANTIC:CAST} .. I could see something like %{[NAMESPACE :]FIELD_NAME} being pretty useful and backwards compatibility being maintained. Could provide a few default namespacs like ENV, LOCAL.. and anything else useful.

Example below:

filter {
  ...
  mutate {
    add_field => {
      "LOCAL:SCHEMA" => "https" # Not included in output
      "LOCAL:VAR_NAME" => "LOCAL_VALUE" # Not included in output
      "GLOBAL_ENV_ADDR" => "%{ENV:ADD_TO_ALL}" # Included in output
      "GLOBAL_NAME" => "GLOBAL_VALUE" # Included in output
    }
  }
}
...
output {
  if [LOCAL:VARNAME] ... {

    http {
      codec => "json"
      http_method => "post"
      url => "%{LOCAL:SCHEMA}://%{ENV:MY_ENV_PROD_PORT_12285_TCP_ADDR}:%{ENV:MY_ENV_PROD_PORT_12285_TCP_PORT}/..."
    }
  }
}

Just food for thought. Thanks.

-Chris

untergeek commented 9 years ago

@cstockton this is an argument in favor of having the environment filter put all of those variables into the @metadata field by default. Then they wouldn't show up in the output.

untergeek commented 9 years ago

Here: https://github.com/logstash-plugins/logstash-filter-environment/issues/4

jordansissel commented 9 years ago

@cstockton and @fbaligand - Could you review https://github.com/logstash-plugins/logstash-filter-environment/issues/5 and let us know what you think?

untergeek commented 9 years ago

@cstockton and @fbaligand - I think @jordansissel meant https://github.com/logstash-plugins/logstash-filter-environment/pull/5 (the pull request)

jordansissel commented 9 years ago

lol, failure on my part. Good catch, @untergeek !

fbaligand commented 9 years ago

environment plugin is useful for some cases, but not for all. In the sample in my issue, I inject a env variable in a input plugin. This can't be done using environment plugin. environment plugin can't neither be used to assign int config properties and static config properties (which not process %{...}).

That's why a native env variable injection pre-processing in logstash would be very welcome !

fbaligand commented 9 years ago

Regarding environment plugin enhancement (using @metadata), this sounds to me as an excellent idea ! In most cases, the environment variables provided by this plugin are used to configure output config properties, and not to be part of the event itself.

cstockton commented 9 years ago

thanks a lot @jordansissel this certainly fixes my use case, I responded with a little feedback in addition at https://github.com/logstash-plugins/logstash-filter-environment/pull/5#issuecomment-142334792

fbaligand commented 9 years ago

Thanks for the improvement in environment plugin @untergeek !

Regarding this issue now, it's still relevant to address all cases that are not covered by environment plugin (input plugin properties, int plugin properties, and all plugin properties that do not support dynamic field injection).

cstockton commented 9 years ago

@fbaligand That is a good point, I could see input plugins needing access to environmental variables for setting up listeners. I think it would be pretty clean to denote @ 'identifier' as special kind of proxy field (seems like what metadata sort of is) and adding @env, I.E.:

input {
  tcp {
    port => [@env][DOCKER_PROVIDED_PORT_12285_TCP_PORT] } }
fbaligand commented 9 years ago

That's an interesting option ! @jordansissel is that simpler to do [@env][MYVAR] or "${MYVAR}" ?

Le 23 sept. 2015 à 00:43, Chris Stockton notifications@github.com a écrit :

@fbaligand That is a good point, I could see input plugins needing access to environmental variables for setting up listeners. I think it would be pretty clean to denote @ 'identifier' as special kind of proxy field (seems like what metadata sort of is) and adding @env, I.E.:

input { tcp { port => [@env][DOCKER_PROVIDED_PORT_12285_TCP_PORT] } } — Reply to this email directly or view it on GitHub.

jordansissel commented 9 years ago

the [@env] syntax feels weird because it uses what is called field reference syntax and events are the only thing in logstash that have fields, and events don't really have "environment variables", further, there's no event available for inputs, so having field reference syntax there would be really confusing ;P

I'm still not really in favor of this yet since m4/puppet/etc seems so simple to me. I don't want to dump this as something we force ops folks to solve, but I'm also not sure about the added burdens of additional syntax in the config file.

Carry this knowing that we are working on clustering and other concepts for logstash that will outlive the lifetime of a single logstash-process, so in the clustered world, configuring via environment variables feels quite weird. I think it's weird because I really want configuration with one interface, not multiple, and with clustering, the configuration comes from some central authority, and using environment variables complicates that - who evaluates the env vars? Each node? Just the central authority? If each node, now you have two sources of configuration instead of just one.

Thoughts?

fbaligand commented 9 years ago

@jordansissel In a cluster world, the concept of ${MYVAR} can be extended to support various sources : environment variables, but also cluster variables which are defined in your management console (for example).

To take example from spring or log4j 2, both of them support multiple sources when resolving ${MYVAR} : env variables, java system properties, java JNDI variables, ...

cstockton commented 9 years ago

I see where you are going with that @jordansissel .. I am not sure what kind of architecture would stay up to speed as logstash grows. The first thing to pop into my head would be some sort of "variable/field" providers. Then you could create etcd, environment, or even a swagger api provider. Given the fact that variable providers would be a form of input, maybe it could be done via inputs.. I don't know anything about the plugin systems API but maybe it would be robust enough already to do this through plugins today.

Below is an example of something that would be pretty nice for me with the disclaimer that I've only been working with logstash for a few weeks.. so it may not be idiomatic or align at all with logstash's long term goals/vision so sorry in advance if it's some sort of butchery :p

I.E.

input {
  env { }
  etcd { prefix => "_ETCD", url => "http://%{ETCD_ADDR}:%{ETCD_PORT}/v2/keys", ttl => 60 } # ETCD_ADDR is from environment, no prefix
  syslog { port => "%{_ETCD_SYSLOG_LISTEN_PORT}" }
  tcp { port => "%{_ETCD_FOO_LISTEN_PORT}" }
}

The main thing is it seems that "properties / variables" throughout logstash's configuration seem to always be bound to the event. However users like me, (incorreclty perhaps?) are trying to get non-event bound static attributes resolved at compile time.. while forseeing cases where call time resolved attributes feel inevitable. There is of course other ways to achieve both of those two things though. It all depends if you think it is dirty and wrong, or glorious for someone to do this:

output {
  if [@metadata][product] in [_ETCD_ENABLED_PRODUCTS] {
    elasticsearch { ... } } }
untergeek commented 9 years ago

I like the idea of having an env { } block within inputs (rather than as its own input):

input {
  tcp {
    env => true
    port => @env["_ETCD_FOO_LISTEN_PORT"]
  }
}

Not sure how easy it would be to add this, but to me it's either this, or make the @env ivar available across all of Logstash (I know, ivars and having to extend each plugin to support a given ivar...).

fbaligand commented 9 years ago

Up to me, it is really easier that logstash core itself pre-process ${MYVAR} tokens just before injecting result value in the plugin property. (Using the example configuration format I put in issue) And it is especially not the responsibility to the plugin to interpret environment variable references.

fbaligand commented 8 years ago

@suyograo @jordansissel @acchen97 @cstockton @untergeek This would be great to support the same mechanism than beat : https://github.com/elastic/beats/pull/715

I like very much their ability to have a default value or to support array value.

mikeholczer commented 8 years ago

Chiming in to support the idea of implementing this similar to beats.

fbaligand commented 8 years ago

@mikeholczer I totally agree with you. If you look at my pull request, it is exactly what I did.

jordansissel commented 8 years ago

It avoids to generate an instance configuration from a template configuration, each time we start logstash.

I understand, but it feels like this assumes your method for starting logstash is immutable, which isn't true. Your "start logstash" procedure can include generating the config before executing bin/logstash.

My general concern here is that I don't think environment variables are a good solution for this, especially when I consider the Logstash roadmap.

Environment variables can only be communicated once to a given process, at the start. They can never be altered outside the process. This immutability of runtime-configuration is counter to one existing feature as well as one future feature.

The existing feature is automatic configuration reloads - with reloading, you cannot ever change the environment variable even though the configuration files themselves can be changed. If you try to use environment variables for configuration settings, you will be required to do a full restart of Logstash in order to change these values.

The future feature is centralized configuration (with ability to change the configuration after startup). For the same reason as config reloading that we have, today, I feel environment variable immutability will make this not valuable, or at the very least, not something most users would use (I won't speak for all users, though).

Logstash is intended to be a long-running process, and with our progress towards making Logstash more configurable while running, introducing partial immutability (environment variables) feels like a step backwards.

ip2k commented 8 years ago

+1; in my use-case I'm baking an AMI separately from where the instance will actually run, so the env I need to inject ( EC2_INSTANCE_ID ) would be different in my bake phase (which generates 1 AMI) vs where I'm running Logstash (multiple instances). Because of this, I currently use a script that runs once when the instance first boots. Being able to reference env vars from within the LS config itself would be massively helpful in this case :)

fbaligand commented 8 years ago

Hi @jordansissel,

Firstable, environment variable is a really common and standard way to parameterize servers, tools, ... Numerous people use that and need that. You can see in this issue that :

Secondary, numerous frameworks process environment variable injection ; this is a common feature. I can quote log4j 2, logback and spring. These 3 frameworks also support hot configuration reload, and they don't consider these features as opposite.

Thirdly, elasticsearch itself has this feature since a long time (https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html#node-name). And Beats will have this feature in its release 2.0.0 (https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html#node-name). I don't understand why it would be not relevant for logstash.

Fourth, in a first step, we could only have environment variable injection. But when logstash central configuration will be available, we can add "logstash central variable injection". And why not "java system property injection". All that using the same mechanism. For example, spring supports such a mechanism and I can guarantee you that it is very helpfull ! Concretely, when some user references a variable, logstash-core search firstable in "logstash central configuration", and if not found, in environment variables, and if not found, replace reference by empty string. And this is really easy to add this in mixin.rb

Finally, up to me, for all these reasons, this is really a key feature and absolutely not a step backwards.

samcday commented 8 years ago

I stumbled on this issue because I need to do exactly what @ip2k mentioned - I want to use the EC2 instance ID as a parameter for a Logstash plugin. It would be awesome if the config file supported environment var interpolations.

jordansissel commented 8 years ago

4710 is merged and supports this feature.

nvtkaszpir commented 8 years ago

works with jdbc input plugin like a charm :D

fbaligand commented 8 years ago

Nice :)