elastic / elastic-package

elastic-package - Command line tool for developing Elastic Integrations
Other
49 stars 116 forks source link

elastic-package test pipeline fails when reroute processors are defined in both pipeline and routing_rules.yml #1387

Open kaiyan-sheng opened 1 year ago

kaiyan-sheng commented 1 year ago

When routing_rules.yml looks like this:

- source_dataset: awsfirehose.log
  rules:
    - target_dataset: aws.cloudtrail
      if: ctx['aws.cloudwatch.log_stream'].contains('CloudTrail')
      namespace:
        - "{{labels.data_stream.namespace}}"
        - default

and ingest_pipeline/default.yml looks like this:

---
description: Pipeline for rerouting logs streams from Amazon Kinesis Data Firehose.
processors:
  - set:
      field: ecs.version
      value: 8.0.0
  - set:
      field: cloud.provider
      value: aws
  - reroute:
      if: ctx['aws.cloudwatch.log_stream'].contains('CloudTrail')
      dataset: aws.cloudtrail
      namespace: default
on_failure:
  - set:
      field: error.message
      value: "{{ _ingest.on_failure_message }}"

I get this error when I run elastic-package test pipeline -d log --generate:

kaiyansheng ~/go/src/github.com/elastic/integrations/packages/awsfirehose [firehose_integration_package] $ elastic-package test pipeline -d log --generate
2023/08/09 16:25:45  WARN CommitHash is undefined, in both /Users/kaiyansheng/.elastic-package/version and the compiled binary, config may be out of date.
Run pipeline tests for the package
Error: error running package pipeline tests: could not complete test run: unmarshalling ingest pipeline content failed: yaml: unmarshal errors:
  line 14: cannot unmarshal !!str `aws.clo...` into []string
  line 16: cannot unmarshal !!str `default` into []string
jsoriano commented 1 year ago

@kaiyan-sheng is this use case expected? Would it be possible to use only routing_rules.yml in this case?

kaiyan-sheng commented 1 year ago

@jsoriano Good point!! I don't think we should allow this use case. Two options are:

  1. only allow routing_rules.yml and report error when reroute processor show up in the pipeline
  2. allow only routing_rules.yml (with no reroute processor) or only reroute processors(without routing_rules.yml)

Since we want routing_rules.yml to be the place to define rerouting, should we choose option 1 here?

jsoriano commented 1 year ago

I think we want to use routing_rules.yml, yes, so the routing rules are always installed in the @custom pipeline and users can customize them.