elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
23 stars 435 forks source link

sei: multiple packages with duplicated field definitions #4398

Closed efd6 closed 1 year ago

efd6 commented 2 years ago

While preparing packages for the ECS updates I found that duplicated field definitions now caused indeterminate outcomes from elastic-package build. This prevented tests from passing while doing the updates and so I fixed the packages that were failing making use of new elastic-package behaviour for vetting v2 packages for duplicate field definitions. At the time, elastic-package did not continue to vet packages beyond failures in the manifest checks, which meant that it was necessary to fix those before being able to investigate duplications.

elastic-package now proceeds to check duplications even when there are manifest vet failures (here and here), and it has become clear that the packages that I found in the original pass based on test failures did not find all cases (for example google_workspace fixed here).

So here is a complete list of non-deprecated SEI packages that have duplicated field definitions found using the script at the footer of this issue. The script requires elastic-package v0.65.0 and was run on the tree at 28fecbd88c90b1e3ae92d50940fabc80b94ff2d7 from the packages directory.

for p in *; do
    grep 'elastic/security-external-integrations' ${p}/manifest.yml >/dev/null || continue
    grep '^description: Deprecated' ${p}/manifest.yml >/dev/null && continue
    gsed -i -e 's/^format_version: 1.0.0/format_version: 2.0.0/' -e '/^license: .*/d' ${p}/manifest.yml;
    (
        cd $p
        m="$(elastic-package build 2>&1 | grep 'defined multiple')"
        if [ "$m" != "" ]; then
            echo "- [ ] $p"
            echo $m \
            | gsed -r 's|^ +[0-9]+\. field "(.*)" is defined multiple times for data stream (.*), found in:|    \2: `\1`|g' \
            | gsed -r 's|/[^ ]+/data_stream/[^ ]+/fields/||g' \
            | sort
        fi
    )
    git reset --hard >/dev/null
done
elasticmachine commented 2 years ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

efd6 commented 2 years ago

/cc @endorama for the script if you want to adapt it for your team.

andrewkroh commented 2 years ago

Adding to the tools for this one. I have fields-yml-recommend that I sometimes use for finding duplicates or things that should be using an ECS definition.

$ go run github.com/andrewkroh/go-examples/fields-yml-recommend@main \
    -f=list packages/auditd/data_stream/log/fields/*

packages/auditd/data_stream/log/fields/base-fields.yml:18 - @timestamp : [Use 'external: ecs' to import the ECS definition.]
...
packages/auditd/data_stream/log/fields/ecs.yml:75 - user.effective.name : [Duplicate field (2 times).]
packages/auditd/data_stream/log/fields/package-fields.yml:34 - user.effective.name : [Duplicate field (2 times).]
narph commented 1 year ago

@efd6 , all PR's are merged, do we still need to do something for this issue?

efd6 commented 1 year ago

obs-cloud-monitoring added aws to the list which they own. I think given that we have completed the list either it should be removed and placed in a new issue or they should take ownership of this issue (I'd prefer the former).

narph commented 1 year ago

@vinit-chauhan you've linked https://github.com/elastic/integrations/pull/4604 to this issue, does this cover the list of duplicate fields for aws in the description of the field? we are looking to close this issue on the security side

vinit-chauhan commented 1 year ago

Hey @narph - As far as I remember, @kaiyan-sheng raised a PR to remove the duplicate fields. And all the duplicates were removed. Here's the PR for reference, #4657.

@efd6 would you mind updating the status for aws?