Conditions in input section are ignored

filex commented 8 years ago

I was trying to work with environment variables in my logstash config. Basicly, I would like to place ifs with environment variables around my redis input. Reading #5115, I understand that it's not possible at the moment.

However, there is a strange behaviour in the input section: Conditions seem to be ignored all together.

The following config/command line illustrates the problem:

input {
  if "a" == "b" {
    stdin{}
  }
}

output {
  stdout{}
}

$ echo 'foo' | docker run -i --rm logstash:2.4 -e 'input{ if "a" == "b" {stdin{}}} output{stdout{}}'
{:timestamp=>"2016-10-18T09:24:40.559000+0000", :message=>"Pipeline main started"}
2016-10-18T09:24:40.544Z 4bb236b25ce5 foo
{:timestamp=>"2016-10-18T09:24:40.681000+0000", :message=>"Pipeline main has been shutdown"}

As the condition "a" == "b" around stdin{} should always be false, I would expect that no input on stdin is expected. However, the event foo is accepted and processed.

On the other hand, the same condition works on output directives (and filters, too):

input {
  stdin{}
}

output {
  if "a" == "b" {
    stdout{}
  }
}

$ echo 'foo' | docker run -i --rm logstash:2.4 -e 'input{stdin{}} output{if "a" == "b"{stdout{}}}'
{:timestamp=>"2016-10-18T09:28:57.491000+0000", :message=>"Pipeline main started"}
{:timestamp=>"2016-10-18T09:28:57.622000+0000", :message=>"Pipeline main has been shutdown"}

Here, as expected, no output is made.

In general, I would expect conditions to work the same way throughout the config. If some operands are not supported in the input stage (e.g. because no event-data is present yet), the conditions should either evaluate according to nil comparisons, or a syntax error should be thrown for conditions in the input section.

guyboertje commented 8 years ago

@filex - I agree that a syntax error should probably be raised if conditionals are seen in input sections.

You understand correctly, we can't use the conditionals in an input as is because they operate on an event.

filex commented 8 years ago

I must admit, that the documentation has an "important" note, that says that conditionals do not work with inputs. Obviously not important enough for me to read it :)

On the other hand, I don't think that the absence of events in the input stage is a reason to dismiss these powerful configuration features (field references, sprintf and conditionals). Meta data like environment variables, the current (or startup) time and even static text comparisons (like the one in my example) could work even before an input has generated an event. There are scenarios where each of these possibilities can be useful.

In the env var docs there is an example to configure the tcp input like this:

  tcp {
    port => "${TCP_PORT}"
  }

(I couldn't get this to work). But it seems quite useful to me.

I also like the idea (from #5115) to use env var conditions to disable and configure certain input and output plugins. This would be very useful to containerize a logstash setup for testing and production deployment.

berglh commented 7 years ago

@guyboertje @filex Not sure if I should add a new issue for this, but I think it's on topic.

With the ability of logstash to read in environment variables, I would have thought it would be possible to do environment based conditional inputs:

LS_DEV_MODE=false ./logstash --debug -e '
input { 
  if "${LS_DEV_MODE}" == "false" { 
    file { path => "/tmp/file1" sincedb_path => "/dev/null" start_position => "beginning" } 
  } 
  if "${LS_DEV_MODE}" == "true" { 
    file { path => "/tmp/file2" sincedb_path => "/dev/null" start_position => "beginning" } 
  }
}
output { 
  stdout { codec => "rubydebug" } 
}'

In the current version of logstash (5.0.2), the conditionals are ignored and both inputs are used and appear in the output. One problem for this is that there is currently not Support (for) Environment Variable in conditionals.

Imagine I have configuration stack; where I have inputs for developing configuration (file|tcp) and another for production (redis). By simply specifying an environment variable and comparing it with a conditional filter, I can have development and production inputs in the one filter stack without having to modify anything other than the environment passed to logstash.

This is primarily an issue because I have a lot of people contributing to our logstash configuration and every time a new person comes a long, I need to provide a list of instructions on how to prepare a development environment. I could swap files in and out with a script to change the inputs to launch the dev environment, but the elements for this feature are there and I think this is a great case for conditional inputs.

I do understand the confusion that might occur if you introduced this feature; if you try to compare fields in a conditional input, the document doesn't exist in the pipeline, because it can't come before an input. However, some appropriate syntax checking and errors message would explain why trying to do a field value comparison here is invalid.

guyboertje commented 7 years ago

@filex @berglh We do understand the advantages of supporting environment variables in predicate clauses.

Firstly, currently, deep in the code that builds an executable pipeline from the config text, the left hand side of the predicate is an implicit call on an event. e.g.:

if "${LS_DEV_MODE}" == "false" { 
  file { path => "/tmp/file1" sincedb_path => "/dev/null" start_position => "beginning" } 
}

is sort of interpreted as, Ruby:

lhs = some_event_value_getter(current_event, "${LS_DEV_MODE}")
rhs = some_value_value_getter("false")
if lhs == rhs
  compiled_action
end

To support this feature we would need to rewrite significant parts of the compiler. That said, we are actually rewriting the complier and execution model now and its targeted for 6.0 Sometimes software designs (like cities) cannot sustain continual expansion without consequences.

Secondly, we have resisted some of these changes in the past because we did not have an error destination for when a directive in the config cannot be met by the event being processed - we call this error destination the Dead Letter Queue or DLQ. Why is this important? An example of why this is important is the sprintf feature when used in output destination interpolation. If one specifies that the doc_id of an event in ES should be interpolated from a value in an event, if the value is missing we have had to use the actual string - this means that all events having no interpolated value will have the same doc_id and overwrite each other. If we had a DLQ, the corect behaviour would be to dump those event s to it and allow them to be fixed before retrying. We would want to do the same if an ENV value is missing or a % is used in place of a $ or vice-versa. However, if such an error were to occur in the proposed feature (conditionals in input sections) then we would not have an event to put in the DLQ - we would have to log and error and halt LS.

Thirdly, I suspect a very important reason why conditionals in input sections and conditionals in general is to direct flow through a sub-pipeline. As in if event is like A then use route X else use route Y. We are building support for multiple pipelines in the core first and then in the config later. However this request/feature raises an interesting point about routing or sub-pipeline selection being based on ENV and event properties. /cc @jsvd

Lastly, you can get the ENV based config evaluation outside of LS by splitting your config up into files per section with symlinks for common pieces across three folders e.g. dev, prod and common and then using bash interpolation to specify whether dev or prod config should be used.

berglh commented 7 years ago

@guyboertje Thank you for this thorough explanation on the current design limitation of the logstash execution model with respect to conditionals being tied to events.

I am currently using bash to link up the required input config based upon the environment the logstash instance is being executed via the docker container entrypoint script. Considering other configuration elements; such as TLS certs are being built into these containers, it really isn't a great trouble, just another thing to consider.

For plugins being called in the pipeline after the inputs; assigning the environment variables to metadata fields allows the required conditional filter and output behaviour I am seeking.

Having ENV based conditionals would be a nicety and not a necessity but removes the need for some mutate add_fields; to then have conditional inputs based on ENV would remove the need to adjust the config prior to launch and leads to a marginally more elegant configuration management solution.

On the other hand, having a block of added metatdata fields with ENVs values directly after the inputs is nice way to get an overview of what ENVs are being used in the pipeline. So swings and round-a-bouts with both approaches.

Cheers

jsvd commented 5 years ago

another interesting side effect of conditionals not having an effect in the input section is that a configuration like:

input {
  if "${TEST_VAR}" == "false" {
    generator { message => "it's false" count => 5 }
  } else {
    generator { message => "it's true" count => 5}
  }
}
output { stdout { } }

Will produce both true and false messages.

elastic / logstash

Conditions in input section are ignored #6080