elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
200 stars 431 forks source link

Negative performance impact on wildcard field #568

Closed ph closed 3 years ago

ph commented 3 years ago

quoted from https://github.com/elastic/beats/issues/23671:

As wildcard field changes were due to be released in ECS 1.8, we updated a ton of keyword fields to be wildcard instead from 7.11 onward. However, we've identified that there are significant performance implications that we need time to review - specifically we've seen storage increases of up to 33% and indexing throughout impacted by 25%. ECS is rolling back the wildcard field changes and will not be releasing them in 1.8, so we should effectively rollback the addition of wildcard fields in Beats modules for 7.11 as well.

Looking at the integration fields we seems to have a few wildcard reference.

winterfell~/src/integrations(master|✔) % ag wildcard -l     
packages/checkpoint/docs/README.md
packages/checkpoint/data_stream/firewall/fields/beats.yml
packages/checkpoint/data_stream/firewall/fields/agent.yml
packages/checkpoint/data_stream/firewall/fields/ecs.yml
packages/fortinet/docs/README.md
packages/fortinet/data_stream/firewall/fields/beats.yml
packages/fortinet/data_stream/firewall/fields/ecs.yml
packages/fortinet/data_stream/firewall/fields/agent.yml
packages/iis/docs/README.md
packages/iis/data_stream/access/fields/ecs.yml
packages/iis/data_stream/error/fields/ecs.yml
packages/panw/docs/README.md
packages/panw/data_stream/panos/fields/beats.yml
packages/panw/data_stream/panos/fields/agent.yml
packages/panw/data_stream/panos/fields/ecs.yml
packages/redis/docs/README.md
packages/redis/_dev/build/docs/README.md
packages/suricata/docs/README.md
packages/suricata/data_stream/eve/fields/agent.yml
packages/suricata/data_stream/eve/fields/fields-epr.yml
packages/zeek/data_stream/capture_loss/fields/agent.yml
packages/zeek/data_stream/capture_loss/fields/beats.yml
packages/zeek/data_stream/connection/fields/agent.yml
packages/zeek/data_stream/connection/fields/beats.yml
packages/zeek/data_stream/connection/fields/ecs.yml
packages/zeek/docs/README.md
packages/zeek/data_stream/dce_rpc/fields/ecs.yml
packages/zeek/data_stream/dce_rpc/fields/agent.yml
packages/zeek/data_stream/dhcp/fields/agent.yml
packages/zeek/data_stream/dnp3/fields/ecs.yml
packages/zeek/data_stream/dnp3/fields/agent.yml
packages/zeek/data_stream/dns/fields/agent.yml
packages/zeek/data_stream/dns/fields/ecs.yml
packages/zeek/data_stream/dpd/fields/agent.yml
packages/zeek/data_stream/dpd/fields/ecs.yml
packages/zeek/data_stream/files/fields/agent.yml
packages/zeek/data_stream/ftp/fields/agent.yml
packages/zeek/data_stream/ftp/fields/ecs.yml
packages/zeek/data_stream/http/fields/agent.yml
packages/zeek/data_stream/http/fields/ecs.yml
packages/zeek/data_stream/intel/fields/agent.yml
packages/zeek/data_stream/intel/fields/ecs.yml
packages/zeek/data_stream/irc/fields/agent.yml
packages/zeek/data_stream/irc/fields/ecs.yml
packages/zeek/data_stream/kerberos/fields/agent.yml
packages/zeek/data_stream/kerberos/fields/ecs.yml
packages/zeek/data_stream/modbus/fields/agent.yml
packages/zeek/data_stream/modbus/fields/ecs.yml
packages/zeek/data_stream/mysql/fields/agent.yml
packages/zeek/data_stream/mysql/fields/ecs.yml
packages/zeek/data_stream/notice/fields/ecs.yml
packages/zeek/data_stream/notice/fields/agent.yml
packages/zeek/data_stream/ntlm/fields/ecs.yml
packages/zeek/data_stream/ntlm/fields/agent.yml
packages/zeek/data_stream/ocsp/fields/agent.yml
packages/zeek/data_stream/pe/fields/agent.yml
packages/zeek/data_stream/radius/fields/agent.yml
packages/zeek/data_stream/radius/fields/ecs.yml
packages/zeek/data_stream/rdp/fields/agent.yml
packages/zeek/data_stream/rdp/fields/ecs.yml
packages/zeek/data_stream/rfb/fields/agent.yml
packages/zeek/data_stream/rfb/fields/ecs.yml
packages/zeek/data_stream/sip/fields/ecs.yml
packages/zeek/data_stream/sip/fields/agent.yml
packages/zeek/data_stream/smb_cmd/fields/ecs.yml
packages/zeek/data_stream/smb_cmd/fields/agent.yml
packages/zeek/data_stream/smb_files/fields/ecs.yml
packages/zeek/data_stream/smb_files/fields/agent.yml
packages/zeek/data_stream/smb_mapping/fields/ecs.yml
packages/zeek/data_stream/smb_mapping/fields/agent.yml
packages/zeek/data_stream/smtp/fields/agent.yml
packages/zeek/data_stream/smtp/fields/ecs.yml
packages/zeek/data_stream/snmp/fields/agent.yml
packages/zeek/data_stream/snmp/fields/ecs.yml
packages/zeek/data_stream/socks/fields/ecs.yml
packages/zeek/data_stream/socks/fields/agent.yml
packages/zeek/data_stream/ssh/fields/ecs.yml
packages/zeek/data_stream/ssh/fields/agent.yml
packages/zeek/data_stream/ssl/fields/ecs.yml
packages/zeek/data_stream/ssl/fields/agent.yml
packages/zeek/data_stream/stats/fields/agent.yml
packages/zeek/data_stream/syslog/fields/agent.yml
packages/zeek/data_stream/syslog/fields/ecs.yml
packages/zeek/data_stream/traceroute/fields/agent.yml
packages/zeek/data_stream/traceroute/fields/ecs.yml
packages/zeek/data_stream/tunnel/fields/ecs.yml
packages/zeek/data_stream/tunnel/fields/agent.yml
packages/zeek/data_stream/weird/fields/ecs.yml
packages/zeek/data_stream/weird/fields/agent.yml
packages/zeek/data_stream/x509/fields/agent.yml
packages/zeek/data_stream/x509/fields/ecs.yml
packages/juniper/data_stream/srx/fields/ecs.yml
packages/juniper/docs/README.md
packages/zoom/data_stream/webhook/fields/ecs.yml
packages/zoom/docs/README.md
packages/cef/kibana/search/cef-5cede2d3-20fe-4140-add4-4c4f841b71a2.json
packages/cef/kibana/search/cef-e6cf2383-71f4-4db1-a791-1a7d4f110194.json
packages/cef/kibana/search/cef-f85a3444-8a43-4e46-b872-4e44bc25d0f3.json
packages/google_workspace/data_stream/admin/fields/ecs.yml
packages/google_workspace/data_stream/drive/fields/ecs.yml
packages/google_workspace/data_stream/groups/fields/ecs.yml
packages/google_workspace/data_stream/login/fields/ecs.yml
packages/google_workspace/data_stream/saml/fields/ecs.yml
packages/google_workspace/data_stream/user_accounts/fields/ecs.yml
packages/google_workspace/docs/README.md
packages/osquery/data_stream/result/fields/ecs.yml
packages/osquery/docs/README.md
elasticmachine commented 3 years ago

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

ph commented 3 years ago

@ruflin I saw your message later in the benchmark discussion, I see quite a few usage of wildcards in the integration packages, I agree we are not affected as drastically as beats but we should probably audit the above, there is a lot of reference of wildcards especially on the zeek integration. Are we using it correctly ?

andrewstucki commented 3 years ago

@ph -- these fields were changed to bring the packages to parity with the beats modules. We're using them "correctly" in the sense that these fields are all marked as wildcard fields in the experimental/1.8 ECS schema.

Personally I'd be fine keeping these as wildcard since the changes are slightly more constrained than beats, as @ruflin mentioned, but if we want to revert them in the same way the 7.11 beats modules are getting reverted, I'm fine with that too.

WRT zeek the reason the integration has so many fields that are wildcard is because the package itself is huge--both the filebeat module and the package contain ~30 data streams IIRC, so they have a lot of fields in general.

ph commented 3 years ago

Thanks for the details @andrewstucki I am fine with keeping them as is and will be closing this.

andrewkroh commented 3 years ago

We are going to revert the wildcard changes in packages to be consistent with Beats for the time being. We'll swap over to using on the non-experimental fields for ECS 1.7. After ECS promotes the wildcard fields to non-experimental status we'll adopt them.