elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.12k forks source link

[Fleet] Potential breaking change with APM data streams (maybe others) and Fleet ingest pipeline customization hooks #175254

Closed kpollich closed 7 months ago

kpollich commented 7 months ago

Summary

In 8.12.0, Fleet introduced new extension points for ingest pipeline customization in the form of additional pipeline processors in Fleet-managed ingest pipelines:

These new extension points allow for more granular customization of ingestion for various use cases, for instance applying global processing across all logs data streams.

The existing extension point of the pattern ${type}-${dataset}@custom e.g. logs-apache.logs-my_namespace@custom is preserved, and is called as the last pipeline processor in each Fleet-managed ingest pipeline.

Problem 1 - Duplicate pipeline processors

APM defines a traces-apm data stream here

Because the package name apm is the same as the dataset apm, Fleet creates a duplicate pipeline processor in the final ingest pipeline for this data stream, e.g.

[
    {
        "pipeline": {
          "name": "global@custom",
          "ignore_missing_pipeline": true
        }
    },
    {
        "pipeline": {
          "name": "traces@custom",
          "ignore_missing_pipeline": true
        }
    },
    {
        "pipeline": {
          "name": "traces-apm@custom",
          "ignore_missing_pipeline": true
        }
    },
    {
        "pipeline": {
          "name": "traces-apm@custom",
          "ignore_missing_pipeline": true
        }
    }
]

In the example above, the first traces-apm@custom processor is of the form ${type}-${package}@custom while the second is of the form ${type}-${dataset}@custom. This duplication should be avoided.

Problem 2 - Breaking change for traces-apm.sampled data stream

APM also defines a traces-apm.sample data stream here. Because this data stream extends on the traces-apm data stream's name, Fleet's customization hooks introduces a breaking change to its processing scheme.

For example, prior to 8.12.0, the traces-apm.sampled-X.Y.Z ingest pipeline would have the following pipeline processor defined:

{
    "pipeline": {
      "name": "traces-apm.sampled@custom",
      "ignore_missing_pipeline": true
    }
}

Following 8.12.0, this pipeline will now have these processors defined:

{
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
},
{
    "pipeline": {
      "name": "traces@custom",
      "ignore_missing_pipeline": true
    }
},
{
    "pipeline": {
      "name": "traces-apm@custom", // <---- Collides with `traces-apm@custom` pipeline defined above for another data stream
      "ignore_missing_pipeline": true
    }
},
{
    "pipeline": {
      "name": "traces-apm.sampled@custom",
      "ignore_missing_pipeline": true
    }
}

The problem (highlighted in the code block above) is that the traces-apm@custom processor, which is intended to be of the form ${type}-${package}@custom overlaps with the traces-apm@custom pipeline defined for the traces-apm data stream above, which is intended to be of the form ${type}-${dataset}@custom. This is technically the same problem as Problem 1 above, but it manifests in a potential breaking change for APM users who have customized their ingest scheme.

If an APM user is relying on customizations they made to the traces-apm@custom ingest pipeline (which was set up by default in release prior to 8.12.0), they will now unexpectedly see that pipeline firing on data ingested to the traces-apm.sampled data stream. This is a breaking change and should be communicated as such.


With both problems above, we likely need some kind of additional specificity to avoid the case where a dataset name overlaps with a package name, as is the case with APM. It'd be great to query the integrations repo to see if we can detect other places where this may be the case and alert those teams.

In the immediate term, we need to communicate this as a known issue + breaking change to our users by adding documentation and updating our 8.12.0 release notes. Following that, let's try to come to a decision quickly on how we can fix the root issue with the duplication/lack of specificity.

cc @simitt @lucabelluccini @nchaulet @kilfoyle

### Tasks
- [ ] https://github.com/elastic/ingest-docs/issues/841
- [x] Determine scope of breaking change, e.g. how many collisions are possible, are other integrations affected
- [x] Determine if a workaround is possible on 8.12.0 and document it if so
- [x] Fix the root cause issue with the collision/duplication, aim to land in 8.12.1
- [x] Make sure this scenario is tested going forward - either unit test or manual test case
elasticmachine commented 7 months ago

Pinging @elastic/fleet (Team:Fleet)

kpollich commented 7 months ago

Our priorities for this are as follows right now:

  1. Document the broken behavior (See https://github.com/elastic/ingest-docs/pull/840 - nearly done)
  2. Determine if there's a workaround available on 8.12.0
  3. Fix the root cause in 8.12.1 (Feature freeze on Jan 30) if possible

I'm updating the task list in the description to reflect these next steps.

kpollich commented 7 months ago

I pulled all of the APM datasets from the integration source:

logs-apm.app
logs-apm.error
metrics-apm.app
metrics-apm.internal
metrics-apm.service_destination.10m
metrics-apm.service_destination.1m
metrics-apm.service_destination.60m
metrics-apm.service_summary.10m
metrics-apm.service_summary.1m
metrics-apm.service_summary.60m
metrics-apm.service_transaction.10m
metrics-apm.service_transaction.1m
metrics-apm.service_transaction.60m
metrics-apm.transaction.10m
metrics-apm.transaction.1m
metrics-apm.transaction.60m
traces-apm
traces-apm.rum
traces-apm.sampled

I believe the only collisions possible here are

traces-apm
traces-apm.rum
traces-apm.sampled

As traces-apm is its own data stream we'll have the collision case described in the description above.

Next, I expanded my search to all integration data streams defined in https://github.com/elastic/integrations. Here's the full set of all integration data streams:

Show list ``` logs-1password.audit_events logs-1password.item_usages logs-1password.signin_attempts logs-activemq.audit logs-activemq.log logs-akamai.siem logs-amazon_security_lake.application_activity logs-amazon_security_lake.discovery logs-amazon_security_lake.event logs-amazon_security_lake.findings logs-amazon_security_lake.iam logs-amazon_security_lake.network_activity logs-amazon_security_lake.system_activity logs-apache.access logs-apache.error logs-apache_tomcat.access logs-apache_tomcat.catalina logs-apache_tomcat.localhost logs-apm.app logs-apm.error logs-arista_ngfw.log logs-atlassian_bitbucket.audit logs-atlassian_confluence.audit logs-atlassian_jira.audit logs-auditd.log logs-auditd_manager.auditd logs-auth0.logs logs-aws.apigateway_logs logs-aws.cloudfront_logs logs-aws.cloudtrail logs-aws.cloudwatch_logs logs-aws.ec2_logs logs-aws.elb_logs logs-aws.emr_logs logs-aws.firewall_logs logs-aws.guardduty logs-aws.inspector logs-aws.route53_public_logs logs-aws.route53_resolver_logs logs-aws.s3access logs-aws.securityhub_findings logs-aws.securityhub_insights logs-aws.vpcflow logs-aws.waf logs-aws_logs.generic logs-awsfirehose logs-azure.activitylogs logs-azure.application_gateway logs-azure.auditlogs logs-azure.eventhub logs-azure.firewall_logs logs-azure.identity_protection logs-azure.platformlogs logs-azure.provisioning logs-azure.signinlogs logs-azure.springcloudlogs logs-azure_app_service.app_service_logs logs-azure_blob_storage.generic logs-azure_frontdoor.access logs-azure_frontdoor.waf logs-azure_functions.functionapplogs logs-barracuda.waf logs-barracuda_cloudgen_firewall.log logs-bitdefender.push_configuration logs-bitdefender.push_notifications logs-bitdefender.push_statistics logs-bitwarden.collection logs-bitwarden.event logs-bitwarden.group logs-bitwarden.member logs-bitwarden.policy logs-bluecoat.director logs-box_events.events logs-carbon_black_cloud.alert logs-carbon_black_cloud.asset_vulnerability_summary logs-carbon_black_cloud.audit logs-carbon_black_cloud.endpoint_event logs-carbon_black_cloud.watchlist_hit logs-carbonblack_edr.log logs-cassandra.log logs-cef.log logs-ceph.cluster_disk logs-ceph.cluster_health logs-ceph.cluster_status logs-ceph.osd_performance logs-ceph.osd_pool_stats logs-ceph.osd_tree logs-ceph.pool_disk logs-checkpoint.firewall logs-cisco_aironet.log logs-cisco_asa.log logs-cisco_duo.admin logs-cisco_duo.auth logs-cisco_duo.offline_enrollment logs-cisco_duo.summary logs-cisco_duo.telephony logs-cisco_ftd.log logs-cisco_ios.log logs-cisco_ise.log logs-cisco_meraki.events logs-cisco_meraki.log logs-cisco_nexus.log logs-cisco_secure_email_gateway.log logs-cisco_secure_endpoint.event logs-cisco_umbrella.log logs-citrix_adc.interface logs-citrix_adc.lbvserver logs-citrix_adc.service logs-citrix_adc.system logs-citrix_adc.vpn logs-citrix_waf.log logs-cloud_defend.alerts logs-cloud_defend.file logs-cloud_defend.process logs-cloud_security_posture.findings logs-cloud_security_posture.vulnerabilities logs-cloudflare.audit logs-cloudflare.logpull logs-cloudflare_logpush.access_request logs-cloudflare_logpush.audit logs-cloudflare_logpush.casb logs-cloudflare_logpush.device_posture logs-cloudflare_logpush.dns logs-cloudflare_logpush.dns_firewall logs-cloudflare_logpush.firewall_event logs-cloudflare_logpush.gateway_dns logs-cloudflare_logpush.gateway_http logs-cloudflare_logpush.gateway_network logs-cloudflare_logpush.http_request logs-cloudflare_logpush.magic_ids logs-cloudflare_logpush.nel_report logs-cloudflare_logpush.network_analytics logs-cloudflare_logpush.network_session logs-cloudflare_logpush.sinkhole_http logs-cloudflare_logpush.spectrum_event logs-cloudflare_logpush.workers_trace logs-coredns.log logs-couchbase.node logs-cribl logs-crowdstrike.falcon logs-crowdstrike.fdr logs-cyberark_pta.events logs-cyberarkpas.audit logs-cylance.protect logs-darktrace.ai_analyst_alert logs-darktrace.model_breach_alert logs-darktrace.system_status_alert logs-docker.container_logs logs-elastic_agent logs-elastic_agent.apm_server logs-elastic_agent.auditbeat logs-elastic_agent.cloud_defend logs-elastic_agent.cloudbeat logs-elastic_agent.endpoint_security logs-elastic_agent.filebeat logs-elastic_agent.filebeat_input logs-elastic_agent.fleet_server logs-elastic_agent.heartbeat logs-elastic_agent.metricbeat logs-elastic_agent.osquerybeat logs-elastic_agent.packetbeat logs-elastic_agent.pf_elastic_collector logs-elastic_agent.pf_elastic_symbolizer logs-elastic_agent.pf_host_agent logs-elasticsearch.audit logs-elasticsearch.deprecation logs-elasticsearch.gc logs-elasticsearch.server logs-elasticsearch.slowlog logs-entityanalytics_entra_id.device logs-entityanalytics_entra_id.entity logs-entityanalytics_entra_id.user logs-entityanalytics_okta.user logs-eset_protect.detection logs-eset_protect.device_task logs-eset_protect.event logs-f5.bigipafm logs-f5.bigipapm logs-f5_bigip.log logs-fim.event logs-fireeye.nx logs-fleet_server.output_health logs-forcepoint_web.logs logs-forgerock.am_access logs-forgerock.am_activity logs-forgerock.am_authentication logs-forgerock.am_config logs-forgerock.am_core logs-forgerock.idm_access logs-forgerock.idm_activity logs-forgerock.idm_authentication logs-forgerock.idm_config logs-forgerock.idm_core logs-forgerock.idm_sync logs-fortinet_forticlient.log logs-fortinet_fortiedr.log logs-fortinet_fortigate.log logs-fortinet_fortimail.log logs-fortinet_fortimanager.log logs-gcp.audit logs-gcp.dns logs-gcp.firewall logs-gcp.loadbalancing_logs logs-gcp.vpcflow logs-gcp_pubsub.generic logs-github.audit logs-github.code_scanning logs-github.dependabot logs-github.issues logs-github.secret_scanning logs-golang.expvar logs-golang.heap logs-google_cloud_storage.generic logs-google_scc.asset logs-google_scc.audit logs-google_scc.finding logs-google_scc.source logs-google_workspace.access_transparency logs-google_workspace.admin logs-google_workspace.alert logs-google_workspace.context_aware_access logs-google_workspace.device logs-google_workspace.drive logs-google_workspace.gcp logs-google_workspace.group_enterprise logs-google_workspace.groups logs-google_workspace.login logs-google_workspace.rules logs-google_workspace.saml logs-google_workspace.token logs-google_workspace.user_accounts logs-hadoop.application logs-haproxy.log logs-hashicorp_vault.audit logs-hashicorp_vault.log logs-hid_bravura_monitor.log logs-hid_bravura_monitor.winlog logs-http_endpoint.generic logs-httpjson.generic logs-ibmmq.errorlog logs-iis.access logs-iis.error logs-imperva.securesphere logs-infoblox_bloxone_ddi.dhcp_lease logs-infoblox_bloxone_ddi.dns_config logs-infoblox_bloxone_ddi.dns_data logs-infoblox_nios.log logs-iptables.log logs-istio.access_logs logs-jamf_compliance_reporter.log logs-jumpcloud.events logs-juniper_junos.log logs-juniper_netscreen.log logs-juniper_srx.log logs-kafka.log logs-kafka_log.generic logs-keycloak.log logs-kibana.audit logs-kibana.log logs-kubernetes.audit_logs logs-kubernetes.container_logs logs-lastpass.detailed_shared_folder logs-lastpass.event_report logs-lastpass.user logs-logstash.log logs-logstash.slowlog logs-lyve_cloud.audit logs-m365_defender.event logs-m365_defender.incident logs-m365_defender.log logs-mattermost.audit logs-microsoft_defender_cloud.event logs-microsoft_defender_endpoint.log logs-microsoft_dhcp.log logs-microsoft_exchange_online_message_trace.log logs-microsoft_sqlserver.audit logs-microsoft_sqlserver.log logs-mimecast.archive_search_logs logs-mimecast.audit_events logs-mimecast.dlp_logs logs-mimecast.siem_logs logs-mimecast.threat_intel_malware_customer logs-mimecast.threat_intel_malware_grid logs-mimecast.ttp_ap_logs logs-mimecast.ttp_ip_logs logs-mimecast.ttp_url_logs logs-modsecurity.auditlog logs-mongodb.log logs-mysql.error logs-mysql.slowlog logs-mysql_enterprise.audit logs-nagios_xi.events logs-nagios_xi.host logs-nagios_xi.service logs-nats.log logs-netflow.log logs-netscout.sightline logs-netskope.alerts logs-netskope.events logs-network_traffic.amqp logs-network_traffic.cassandra logs-network_traffic.dhcpv4 logs-network_traffic.dns logs-network_traffic.flow logs-network_traffic.http logs-network_traffic.icmp logs-network_traffic.memcached logs-network_traffic.mongodb logs-network_traffic.mysql logs-network_traffic.nfs logs-network_traffic.pgsql logs-network_traffic.redis logs-network_traffic.sip logs-network_traffic.thrift logs-network_traffic.tls logs-nginx.access logs-nginx.error logs-nginx_ingress_controller.access logs-nginx_ingress_controller.error logs-o365.audit logs-okta.system logs-oracle.database_audit logs-oracle_weblogic.access logs-oracle_weblogic.admin_server logs-oracle_weblogic.domain logs-oracle_weblogic.managed_server logs-osquery.result logs-osquery_manager.result logs-panw.panos logs-panw_cortex_xdr.alerts logs-panw_cortex_xdr.incidents logs-pfsense.log logs-php_fpm.pool logs-php_fpm.process logs-ping_one.audit logs-platform_observability.kibana_audit logs-platform_observability.kibana_log logs-postgresql.log logs-prisma_cloud.alert logs-prisma_cloud.audit logs-prisma_cloud.host logs-prisma_cloud.host_profile logs-prisma_cloud.incident_audit logs-proofpoint_tap.clicks_blocked logs-proofpoint_tap.clicks_permitted logs-proofpoint_tap.message_blocked logs-proofpoint_tap.message_delivered logs-pulse_connect_secure.log logs-qnap_nas.log logs-qualys_vmdr.asset_host_detection logs-qualys_vmdr.knowledge_base logs-rabbitmq.log logs-radware.defensepro logs-rapid7_insightvm.asset logs-rapid7_insightvm.vulnerability logs-redis.log logs-redis.slowlog logs-salesforce.apex logs-salesforce.login_rest logs-salesforce.login_stream logs-salesforce.logout_rest logs-salesforce.logout_stream logs-salesforce.setupaudittrail logs-santa.log logs-sentinel_one.activity logs-sentinel_one.agent logs-sentinel_one.alert logs-sentinel_one.group logs-sentinel_one.threat logs-sentinel_one_cloud_funnel.event logs-slack.audit logs-snort.log logs-snyk.audit logs-snyk.vulnerabilities logs-sonicwall_firewall.log logs-sophos.utm logs-sophos.xg logs-sophos_central.alert logs-sophos_central.event logs-spring_boot.audit_events logs-spring_boot.http_trace logs-squid.log logs-stan.log logs-suricata.eve logs-symantec_edr_cloud.incident logs-symantec_endpoint.log logs-sysmon_linux.log logs-system.application logs-system.auth logs-system.security logs-system.syslog logs-system.system logs-system_audit.package logs-tanium.action_history logs-tanium.client_status logs-tanium.discover logs-tanium.endpoint_config logs-tanium.reporting logs-tanium.threat_response logs-tcp.generic logs-tenable_io.asset logs-tenable_io.plugin logs-tenable_io.scan logs-tenable_io.vulnerability logs-tenable_sc.asset logs-tenable_sc.plugin logs-tenable_sc.vulnerability logs-thycotic_ss.logs logs-ti_abusech.malware logs-ti_abusech.malwarebazaar logs-ti_abusech.threatfox logs-ti_abusech.url logs-ti_anomali.threatstream logs-ti_cif3.feed logs-ti_crowdstrike.intel logs-ti_crowdstrike.ioc logs-ti_cybersixgill.threat logs-ti_eclecticiq.threat logs-ti_maltiverse.indicator logs-ti_mandiant_advantage.threat_intelligence logs-ti_misp.threat logs-ti_misp.threat_attributes logs-ti_opencti.indicator logs-ti_otx.pulses_subscribed logs-ti_otx.threat logs-ti_rapid7_threat_command.alert logs-ti_rapid7_threat_command.ioc logs-ti_rapid7_threat_command.vulnerability logs-ti_recordedfuture.threat logs-ti_threatq.threat logs-tines.audit_logs logs-tines.time_saved logs-tomcat.log logs-traefik.access logs-trellix_edr_cloud.event logs-trellix_epo_cloud.device logs-trellix_epo_cloud.event logs-trellix_epo_cloud.group logs-trend_micro_vision_one.alert logs-trend_micro_vision_one.audit logs-trend_micro_vision_one.detection logs-trendmicro.deep_security logs-udp.generic logs-vectra_detect.log logs-vsphere.log logs-windows.applocker_exe_and_dll logs-windows.applocker_msi_and_script logs-windows.applocker_packaged_app_deployment logs-windows.applocker_packaged_app_execution logs-windows.forwarded logs-windows.powershell logs-windows.powershell_operational logs-windows.sysmon_operational logs-wiz.audit logs-wiz.issue logs-wiz.vulnerability logs-zeek.capture_loss logs-zeek.connection logs-zeek.dce_rpc logs-zeek.dhcp logs-zeek.dnp3 logs-zeek.dns logs-zeek.dpd logs-zeek.files logs-zeek.ftp logs-zeek.http logs-zeek.intel logs-zeek.irc logs-zeek.kerberos logs-zeek.known_certs logs-zeek.known_hosts logs-zeek.known_services logs-zeek.modbus logs-zeek.mysql logs-zeek.notice logs-zeek.ntlm logs-zeek.ntp logs-zeek.ocsp logs-zeek.pe logs-zeek.radius logs-zeek.rdp logs-zeek.rfb logs-zeek.signature logs-zeek.sip logs-zeek.smb_cmd logs-zeek.smb_files logs-zeek.smb_mapping logs-zeek.smtp logs-zeek.snmp logs-zeek.socks logs-zeek.software logs-zeek.ssh logs-zeek.ssl logs-zeek.stats logs-zeek.syslog logs-zeek.traceroute logs-zeek.tunnel logs-zeek.weird logs-zeek.x509 logs-zerofox.alerts logs-zeronetworks.audit logs-zoom.webhook logs-zscaler_zia.alerts logs-zscaler_zia.dns logs-zscaler_zia.firewall logs-zscaler_zia.tunnel logs-zscaler_zia.web logs-zscaler_zpa.app_connector_status logs-zscaler_zpa.audit logs-zscaler_zpa.browser_access logs-zscaler_zpa.user_activity logs-zscaler_zpa.user_status metrics-activemq.broker metrics-activemq.queue metrics-activemq.topic metrics-airflow.statsd metrics-apache.status metrics-apache_spark.application metrics-apache_spark.driver metrics-apache_spark.executor metrics-apache_spark.node metrics-apache_tomcat.cache metrics-apache_tomcat.connection_pool metrics-apache_tomcat.memory metrics-apache_tomcat.request metrics-apache_tomcat.session metrics-apache_tomcat.thread_pool metrics-apm.app metrics-apm.internal metrics-apm.service_destination.10m metrics-apm.service_destination.1m metrics-apm.service_destination.60m metrics-apm.service_summary.10m metrics-apm.service_summary.1m metrics-apm.service_summary.60m metrics-apm.service_transaction.10m metrics-apm.service_transaction.1m metrics-apm.service_transaction.60m metrics-apm.transaction.10m metrics-apm.transaction.1m metrics-apm.transaction.60m metrics-aws.apigateway_metrics metrics-aws.billing metrics-aws.cloudwatch_metrics metrics-aws.dynamodb metrics-aws.ebs metrics-aws.ec2_metrics metrics-aws.ecs_metrics metrics-aws.elb_metrics metrics-aws.emr_metrics metrics-aws.firewall_metrics metrics-aws.kinesis metrics-aws.lambda metrics-aws.natgateway metrics-aws.rds metrics-aws.redshift metrics-aws.s3_daily_storage metrics-aws.s3_request metrics-aws.s3_storage_lens metrics-aws.sns metrics-aws.sqs metrics-aws.transitgateway metrics-aws.usage metrics-aws.vpn metrics-awsfargate.task_stats metrics-azure.app_insights metrics-azure.app_state metrics-azure.billing metrics-azure.compute_vm metrics-azure.compute_vm_scaleset metrics-azure.container_instance metrics-azure.container_registry metrics-azure.container_service metrics-azure.database_account metrics-azure.function metrics-azure.monitor metrics-azure.storage_account metrics-cassandra.metrics metrics-cloud_defend.heartbeat metrics-cloud_defend.metrics metrics-cockroachdb.status metrics-containerd.blkio metrics-containerd.cpu metrics-containerd.memory metrics-couchbase.bucket metrics-couchbase.cache metrics-couchbase.cbl_replication metrics-couchbase.cluster metrics-couchbase.database_stats metrics-couchbase.miscellaneous metrics-couchbase.query_index metrics-couchbase.resource metrics-couchbase.xdcr metrics-couchdb.server metrics-docker.container metrics-docker.cpu metrics-docker.diskio metrics-docker.event metrics-docker.healthcheck metrics-docker.image metrics-docker.info metrics-docker.memory metrics-docker.network metrics-elastic_agent.apm_server metrics-elastic_agent.auditbeat metrics-elastic_agent.cloudbeat metrics-elastic_agent.elastic_agent metrics-elastic_agent.endpoint_security metrics-elastic_agent.filebeat metrics-elastic_agent.filebeat_input metrics-elastic_agent.fleet_server metrics-elastic_agent.heartbeat metrics-elastic_agent.metricbeat metrics-elastic_agent.osquerybeat metrics-elastic_agent.packetbeat metrics-elastic_package_registry.metrics metrics-elasticsearch.ingest_pipeline metrics-elasticsearch.stack_monitoring.ccr metrics-elasticsearch.stack_monitoring.cluster_stats metrics-elasticsearch.stack_monitoring.enrich metrics-elasticsearch.stack_monitoring.index metrics-elasticsearch.stack_monitoring.index_recovery metrics-elasticsearch.stack_monitoring.index_summary metrics-elasticsearch.stack_monitoring.ml_job metrics-elasticsearch.stack_monitoring.node metrics-elasticsearch.stack_monitoring.node_stats metrics-elasticsearch.stack_monitoring.pending_tasks metrics-elasticsearch.stack_monitoring.shard metrics-enterprisesearch.stack_monitoring.health metrics-enterprisesearch.stack_monitoring.stats metrics-etcd.leader metrics-etcd.metrics metrics-etcd.self metrics-etcd.store metrics-fleet_server.agent_status metrics-fleet_server.agent_versions metrics-gcp.billing metrics-gcp.cloudrun_metrics metrics-gcp.cloudsql_mysql metrics-gcp.cloudsql_postgresql metrics-gcp.cloudsql_sqlserver metrics-gcp.compute metrics-gcp.dataproc metrics-gcp.firestore metrics-gcp.gke metrics-gcp.loadbalancing_metrics metrics-gcp.pubsub metrics-gcp.redis metrics-gcp.storage metrics-hadoop.cluster metrics-hadoop.datanode metrics-hadoop.namenode metrics-hadoop.node_manager metrics-haproxy.info metrics-haproxy.stat metrics-hashicorp_vault.metrics metrics-ibmmq.qmgr metrics-iis.application_pool metrics-iis.webserver metrics-iis.website metrics-influxdb.advstatus metrics-influxdb.status metrics-istio.istiod_metrics metrics-istio.proxy_metrics metrics-kafka.broker metrics-kafka.consumergroup metrics-kafka.partition metrics-kibana.background_task_utilization metrics-kibana.stack_monitoring.cluster_actions metrics-kibana.stack_monitoring.cluster_rules metrics-kibana.stack_monitoring.node_actions metrics-kibana.stack_monitoring.node_rules metrics-kibana.stack_monitoring.stats metrics-kibana.stack_monitoring.status metrics-kibana.task_manager_metrics metrics-kubernetes.apiserver metrics-kubernetes.container metrics-kubernetes.controllermanager metrics-kubernetes.event metrics-kubernetes.node metrics-kubernetes.pod metrics-kubernetes.proxy metrics-kubernetes.scheduler metrics-kubernetes.state_container metrics-kubernetes.state_cronjob metrics-kubernetes.state_daemonset metrics-kubernetes.state_deployment metrics-kubernetes.state_job metrics-kubernetes.state_namespace metrics-kubernetes.state_node metrics-kubernetes.state_persistentvolume metrics-kubernetes.state_persistentvolumeclaim metrics-kubernetes.state_pod metrics-kubernetes.state_replicaset metrics-kubernetes.state_resourcequota metrics-kubernetes.state_service metrics-kubernetes.state_statefulset metrics-kubernetes.state_storageclass metrics-kubernetes.system metrics-kubernetes.volume metrics-linux.conntrack metrics-linux.entropy metrics-linux.iostat metrics-linux.ksm metrics-linux.memory metrics-linux.network_summary metrics-linux.pageinfo metrics-linux.raid metrics-linux.service metrics-linux.socket metrics-linux.users metrics-logstash.node metrics-logstash.pipeline metrics-logstash.plugins metrics-logstash.stack_monitoring.node metrics-logstash.stack_monitoring.node_stats metrics-memcached.stats metrics-microsoft_sqlserver.performance metrics-microsoft_sqlserver.transaction_log metrics-mongodb.collstats metrics-mongodb.dbstats metrics-mongodb.metrics metrics-mongodb.replstatus metrics-mongodb.status metrics-mysql.galera_status metrics-mysql.performance metrics-mysql.status metrics-nats.connection metrics-nats.connections metrics-nats.route metrics-nats.routes metrics-nats.stats metrics-nats.subscriptions metrics-nginx.stubstatus metrics-oracle.memory metrics-oracle.performance metrics-oracle.sysmetric metrics-oracle.system_statistics metrics-oracle.tablespace metrics-oracle_weblogic.deployed_application metrics-oracle_weblogic.threadpool metrics-postgresql.activity metrics-postgresql.bgwriter metrics-postgresql.database metrics-postgresql.statement metrics-prometheus.collector metrics-prometheus.query metrics-prometheus.remote_write metrics-rabbitmq.connection metrics-rabbitmq.exchange metrics-rabbitmq.node metrics-rabbitmq.queue metrics-redis.info metrics-redis.key metrics-redis.keyspace metrics-redisenterprise.node metrics-redisenterprise.proxy metrics-spring_boot.gc metrics-spring_boot.memory metrics-spring_boot.threading metrics-stan.channels metrics-stan.stats metrics-stan.subscriptions metrics-system.core metrics-system.cpu metrics-system.diskio metrics-system.filesystem metrics-system.fsstat metrics-system.load metrics-system.memory metrics-system.network metrics-system.process metrics-system.process.summary metrics-system.socket_summary metrics-system.uptime metrics-traefik.health metrics-vsphere.datastore metrics-vsphere.host metrics-vsphere.virtualmachine metrics-websphere_application_server.jdbc metrics-websphere_application_server.servlet metrics-websphere_application_server.session_manager metrics-websphere_application_server.threadpool metrics-windows.perfmon metrics-windows.service metrics-zookeeper.connection metrics-zookeeper.mntr metrics-zookeeper.server synthetics-browser synthetics-browser.network synthetics-browser.screenshot synthetics-http synthetics-icmp synthetics-tcp traces-apm traces-apm.rum traces-apm.sampled ```

From what I understand, the only way a collision like this is possible is when an integration data stream defines a custom dataset value. Else, the data stream will receive the default name of the form ${type}-${packageName}.${dataStreamDirectory} which will prevent collisions by nature of directory names being unique.

The list of data streams that explicitly define dataset in their manifest.yml is a bit shorter, e.g:

Show list ``` logs-amazon_security_lake.application_activity logs-amazon_security_lake.discovery logs-amazon_security_lake.event logs-amazon_security_lake.findings logs-amazon_security_lake.iam logs-amazon_security_lake.network_activity logs-amazon_security_lake.system_activity logs-apm.app logs-apm.error logs-awsfirehose logs-cloud_security_posture.findings logs-cloud_security_posture.vulnerabilities logs-cribl logs-elastic_agent logs-elastic_agent.apm_server logs-elastic_agent.auditbeat logs-elastic_agent.cloud_defend logs-elastic_agent.cloudbeat logs-elastic_agent.endpoint_security logs-elastic_agent.filebeat logs-elastic_agent.filebeat_input logs-elastic_agent.fleet_server logs-elastic_agent.heartbeat logs-elastic_agent.metricbeat logs-elastic_agent.osquerybeat logs-elastic_agent.packetbeat logs-elastic_agent.pf_elastic_collector logs-elastic_agent.pf_elastic_symbolizer logs-elastic_agent.pf_host_agent logs-entityanalytics_entra_id.device logs-entityanalytics_entra_id.entity logs-entityanalytics_entra_id.user logs-fleet_server.output_health logs-kubernetes.container_logs logs-osquery_manager.result metrics-apm.app metrics-apm.internal metrics-apm.service_destination.10m metrics-apm.service_destination.1m metrics-apm.service_destination.60m metrics-apm.service_summary.10m metrics-apm.service_summary.1m metrics-apm.service_summary.60m metrics-apm.service_transaction.10m metrics-apm.service_transaction.1m metrics-apm.service_transaction.60m metrics-apm.transaction.10m metrics-apm.transaction.1m metrics-apm.transaction.60m metrics-azure.app_insights metrics-azure.app_state metrics-azure.billing metrics-azure.compute_vm metrics-azure.compute_vm_scaleset metrics-azure.container_instance metrics-azure.container_registry metrics-azure.container_service metrics-azure.database_account metrics-azure.function metrics-azure.monitor metrics-azure.storage_account metrics-elastic_agent.apm_server metrics-elastic_agent.auditbeat metrics-elastic_agent.cloudbeat metrics-elastic_agent.elastic_agent metrics-elastic_agent.endpoint_security metrics-elastic_agent.filebeat metrics-elastic_agent.filebeat_input metrics-elastic_agent.fleet_server metrics-elastic_agent.heartbeat metrics-elastic_agent.metricbeat metrics-elastic_agent.osquerybeat metrics-elastic_agent.packetbeat metrics-elasticsearch.ingest_pipeline metrics-elasticsearch.stack_monitoring.ccr metrics-elasticsearch.stack_monitoring.cluster_stats metrics-elasticsearch.stack_monitoring.enrich metrics-elasticsearch.stack_monitoring.index metrics-elasticsearch.stack_monitoring.index_recovery metrics-elasticsearch.stack_monitoring.index_summary metrics-elasticsearch.stack_monitoring.ml_job metrics-elasticsearch.stack_monitoring.node metrics-elasticsearch.stack_monitoring.node_stats metrics-elasticsearch.stack_monitoring.pending_tasks metrics-elasticsearch.stack_monitoring.shard metrics-enterprisesearch.stack_monitoring.health metrics-enterprisesearch.stack_monitoring.stats metrics-fleet_server.agent_status metrics-fleet_server.agent_versions metrics-kibana.background_task_utilization metrics-kibana.stack_monitoring.cluster_actions metrics-kibana.stack_monitoring.cluster_rules metrics-kibana.stack_monitoring.node_actions metrics-kibana.stack_monitoring.node_rules metrics-kibana.stack_monitoring.stats metrics-kibana.stack_monitoring.status metrics-kibana.task_manager_metrics metrics-logstash.node metrics-logstash.stack_monitoring.node metrics-logstash.stack_monitoring.node_stats metrics-system.process.summary synthetics-browser synthetics-browser.network synthetics-browser.screenshot synthetics-http synthetics-icmp synthetics-tcp traces-apm traces-apm.rum traces-apm.sampled ```

A quick manual glance through these data streams reveals some additional cases where there's a collision issue

# Synthetics
synthetics-browser
synthetics-browser.network
synthetics-browser.screenshot

# Elastic agent
logs-elastic_agent
logs-elastic_agent.apm_server
logs-elastic_agent.auditbeat
logs-elastic_agent.cloud_defend
logs-elastic_agent.cloudbeat
logs-elastic_agent.endpoint_security
logs-elastic_agent.filebeat
logs-elastic_agent.filebeat_input
logs-elastic_agent.fleet_server
logs-elastic_agent.heartbeat
logs-elastic_agent.metricbeat
logs-elastic_agent.osquerybeat
logs-elastic_agent.packetbeat
logs-elastic_agent.pf_elastic_collector
logs-elastic_agent.pf_elastic_symbolizer
logs-elastic_agent.pf_host_agent
nchaulet commented 7 months ago

@kpollich I did a similar thing where I tried to install all the packages and logged all packages with dataset being the same as the package name I got the following list:

{package}:{dataset}
awsfirehose:awsfirehose
apm:apm
cribl:cribl
apm:apm
elastic_agent:elastic_agent

I think the idea solution here would have been to have something like this for dataset (but we introduce the package @custom after the dataset one, and we probably did not want to have a breaking change at that point)

name: `${pipeline.dataStream.type}-${pipeline.dataStream.package}-${pipeline.dataStream.dataset}@custom`,
kpollich commented 7 months ago

Thanks, Nicolas - that list aligns with my findings, but now I realize that only cases where the dataset begins with the package name value` will result in collisions - so I think the actual conflicting data sets are limited to the APM traces data streams and the Elastic Agent logs data streams above.

For instance, here's what the synthetics ingest pipelines in question look like on 8.12.0

// synthetics-browser-1.1.1
[
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-browser@custom",
      "ignore_missing_pipeline": true
    }
  }
]

// synthetics-browser.network-1.1.1
[
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-browser.network@custom",
      "ignore_missing_pipeline": true
    }
  }
]

// synthetics-browser.network-1.1.1
[
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-browser.network@custom",
      "ignore_missing_pipeline": true
    }
  }
]

// synthetics-browser.screenshot-1.1.1
[
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-synthetics@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "synthetics-browser.screenshot@custom",
      "ignore_missing_pipeline": true
    }
  }
]

There's no collisions here as synthetics-synthetics is not a pre-existing dataset like traces-apm is. This is clear in the ingest pipelines stack management UI:

image

However, I think the elastic_agent datasets do have collisions, e.g.

// logs-elastic_agent-1.16.0
[
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "logs@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "logs-elastic_agent@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "logs-elastic_agent@custom", <-------------
      "ignore_missing_pipeline": true
    }
  }
]

// logs-elastic_agent.filebeat-1.16.0
[
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "logs@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "logs-elastic_agent@custom", <-------------
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "logs-elastic_agent.filebeat@custom",
      "ignore_missing_pipeline": true
    }
  }
]

If a user had added a custom ingest pipeline for logs-elastic_agent@custom in 8.11, after upgrading the 8.12 they would find that events in the logs-elastic_agent.filebeat data stream (and all other logs-elastic_agent.*` data streams) would also be running through that pipeline unexpectedly.

I think the idea solution here would have been to have something like this for dataset (but we introduce the package @custom after the dataset one, and we probably did not want to have a breaking change at that point)

name: `${pipeline.dataStream.type}-${pipeline.dataStream.package}-${pipeline.dataStream.dataset}@custom`,

This makes sense to me, so we'd have this for the APM data streams in question instead of what we have now:

I think it's a little confusing just because the naming is not super clear here on the integration side, but perhaps we can correct that by adding a dynamic description value to each of these processors, e.g.

// traces-apm.rum-8.12.0
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "Call a global custom pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom",
    "ignore_missing_pipeline": true,
    "description": "Call a custom pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm@custom",
    "ignore_missing_pipeline": true,
    "description": "Call a custom pipeline for all data streams of type `traces` defined by the `apm` integration
  }
},
{
  "pipeline": {
    "name": "traces-apm-apm.rum@custom",
    "ignore_missing_pipeline": true,
    "description": "Call a custom pipeline for only the `apm.rum` dataset"
  }
}
nchaulet commented 7 months ago

Adding the package name to the dataset custom will be a breaking change, so we may want to have a way to opt-in, with a config flag?

kpollich commented 7 months ago

That's a good point, @nchaulet. What if we leave the traces-apm.rum@custom dataset-level processors in place, add a description that marks them as deprecated, then update the names of the new, more granular ones introduced in 8.12.0.

Technically there is still room for the same kind of breaking change between 8.12.0 and 8.12.1 if we take this path, but I think the scope is narrow enough that it would be okay. The remediation will be to just rename your pipelines which should be manageable for users I think.

nchaulet commented 7 months ago

That's a good point, @nchaulet. What if we leave the traces-apm.rum@custom dataset-level processors in place, add a description that marks them as deprecated, then update the names of the new, more granular ones introduced in 8.12.0.

Yes I think it could work

so you will have for apm.rum

traces-apm@custom
traces-apm.rum@custom // deprecated
traces-apm-apm.rum@custom

and for apm (still need to have some deduplication implement right?)

traces-apm@custom
traces-apm-apm@custom 
kpollich commented 7 months ago

Playing around with an implementation for a fix here and making good progress. The naming is a bit wonky and sort of diverges from the data stream naming convention which I feel is not totally ideal, e.g.

{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "logs@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `logs`"
  }
},
{
  "pipeline": {
    "name": "logs-nginx@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `logs` defined by the `nginx` integration"
  }
},
{
  "pipeline": {
    "name": "logs-nginx-nginx.error@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `logs-nginx.error` dataset"
  }
},
{
  "pipeline": {
    "name": "logs-nginx.error@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] (deprecated) Use the `logs-nginx-nginx.error` pipeline instead"
  }
}

Or, for some more prudent APM ingest pipelines:

// apm-traces
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
  }
},
{
  "pipeline": {
    "name": "traces-apm-apm@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `traces-apm` dataset"
  }
}
// apm-traces.rum
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
  }
},
{
  "pipeline": {
    "name": "traces-apm-apm.rum@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `traces-apm.rum` dataset"
  }
},
{
  "pipeline": {
    "name": "traces-apm.rum@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] (deprecated) Use the `traces-apm-apm.rum` pipeline instead"
  }
}

I think the naming is a little clunky, but hopefully it's not too confusing with the description in place. Adding the package name as part of the expected pipeline name seems to be our only path to preventing collisions.

nchaulet commented 7 months ago

Yes the name is a little off we the naming discussion that happened here and a little different from what we have for component template too discussion here , it may be confusing for user (maybe worth getting @felixbarny thoughts here)

Thinking loud here could we have a prefix like .* for the package one instead:

apm.rum

traces-apm.*@custom // pkg
traces-apm@custom  // pkg one deprecated
traces-apm.rum@custom 

apm

traces-apm.*@custom // pkg 
traces-apm@custom 

Not sure this could happen without a breaking change

simitt commented 7 months ago

from https://github.com/elastic/kibana/issues/175254#issuecomment-1906638732

traces-apm@custom traces-apm.rum@custom // deprecated traces-apm-apm.rum@custom

@kpollich @nchaulet why would you deprecate traces-apm.rum@custom ? This is not the conflicting pipeline - the conflicting one is that traces-apm@custom is applied to the traces-apm.rum-<namespace> datastream - in the shared example, this would still be the case.

felixbarny commented 7 months ago

The naming is a bit wonky and sort of diverges from the data stream naming convention which I feel is not totally ideal

Yes the name is a little off we the naming discussion that happened https://github.com/elastic/elasticsearch/issues/96267 and a little different from what we have for component template too discussion https://github.com/elastic/elasticsearch/issues/97664 , it may be confusing for user (maybe worth getting @felixbarny thoughts here)

Thinking loud here could we have a prefix like .* for the package one instead:

I agree that this wouldn't comply with the new naming conventions we've established in https://github.com/elastic/elasticsearch/issues/96267 and it'll probably also be confusing to the user as to which data streams these custom pipelines apply to. Therefore, and because they have been around for longer, I'd bias towards not renaming the custom pipelines for a dataset.

We could declare the names of the new extension points bogus and rename them in a breaking manner.

For example:

traces-apm.package@custom
traces-apm.rum@custom 
traces-apm.package@custom
traces-apm@custom 

I don't think that a suffix like .* is a good idea because it may be perceived as "applies to all data streams that match traces-apm.*.

Or we can keep those around as deprecated that don't cause a conflict. There's also a new deprecated flag for ingest pipelines that we can leverage: https://www.elastic.co/guide/en/elasticsearch/reference/current/put-pipeline-api.html#put-pipeline-api-request-body

kpollich commented 7 months ago

why would you deprecate traces-apm.rum@custom ? This is not the conflicting pipeline - the conflicting one is that traces-apm@custom is applied to the traces-apm.rum- datastream - in the shared example, this would still be the case.

@simitt - I don't think I agree with this assessment as far as which pipeline pattern is intended to appear here.

We want to support a pattern like traces-apm@custom that allows users to customize all documents ingested to datastreams of type traces in the APM integration. This is the intent laid out in https://github.com/elastic/kibana/issues/168019 by the requested {type}-{integration}@custom pattern. e,g. "type" = traces and "integration" = apm.

So, as far as the Fleet implementation is concerned, the expected behavior is that traces-apm@custom appears on any datastream with type traces defiend by the APM integration.

There are real world use cases for this customization, e.g. decorating all logs produced by a given integration (regardless of dataset) with a custom field or deriving a custom metric for another integration.

To be clear, the list of pipeline processors that appear for all integration as of 8.12.0 aligned with their "patterns" are as follows:

We would deprecate traces-apm.rum in this example because that's the pattern we'd need to rename to something that can never collide with the ${type}-${integration}@custom pattern. The root issue here is that the integration part of this pattern is the same as the dataset part of this pattern for the traces-apm datastream. That's why we see the duplication in the traces-apm ingest pipeline as well.

So, the fix we're proposing above is to deprecate the ${type}-${integration}-${dataset}@custom pattern and replace it with something that will never collide with the ${type}-${integration}@custom pattern.

However, @felixbarny's suggestion is more feasible, e.g. this point rings true:

Therefore, and because they have been around for longer, I'd bias towards not renaming the custom pipelines for a dataset.

I'm in agreement with this, so a path forward would be to rename the newer ${type}-${integration}@custom pattern instead. It's also more defensible to me to simply do away with this new pattern entirely in 8.12.1 with a breaking change notice in the release notes, as they'll only have been available for a few weeks anyway and will likely have near-zero adoption. Therefore, the impact of the breaking change will be massively lower compared to renaming + deprecating the dataset-level custom pipelines.

With this mind, I'm proposing we do the following

The deprecated flag on processors is great to know about, but because the collision case here can be potentially highly impactful and less-than-obvious, I'm in favor of just shipping a breaking change to correct the collision instead. Leaving the colliding processor behind as deprecated doesn't actually fix anything for impacted users.

kpollich commented 7 months ago

There's still technically an edge case with the new @{type}-${integration}.package@custom pattern where a dataset name and a package name are the same and the dataset happens to end in package e.g. given a data stream defined as follows

# my_integration/data_streams/foo/package.yml
type: logs

We'd have a datastream pattern of logs-my_integration.package-* and the pipeline patterns would look like this

You could also footgun yourself by providing a custom dataset for any data stream that forces the collision case, e.g.

# my_integration/data_streams/foo/bar.yml
type: logs
dataset: my_integration.package

So maybe we have no choice but to add a restriction on dataset naming to the package spec? The collision case here is, I think, less likely than the current implementation but it still exists.

felixbarny commented 7 months ago

So maybe we have no choice but to add a restriction on dataset naming to the package spec?

That sounds reasonable to me.

kpollich commented 7 months ago

I filed https://github.com/elastic/package-spec/issues/699 to capture the package spec change proposed above.

simitt commented 7 months ago

We want to support a pattern like traces-apm@custom that allows users to customize all documents ingested to datastreams of type traces in the APM integration. This is the intent laid out in https://github.com/elastic/kibana/issues/168019 by the requested {type}-{integration}@custom pattern. e,g. "type" = traces and "integration" = apm.

So, as far as the Fleet implementation is concerned, the expected behavior is that traces-apm@custom appears on any datastream with type traces defiend by the APM integration.

That is exactly what I see as the problem and what is breaking the apm use case; {type}-{integration} was introduced with these pipeline changes, not in alignment with previously agreed on and already established datastream (and derived ingest pipeline) naming patterns of {type}-{dataset}-{namespace}.


+1 on finding a solution where no deprecation of pre 8.12.0 pipelines would be necessary.

Regarding the proposed solution by @felixbarny and @kpollich, can you clarify how that would look like for the apm case?

Replace the ${type}-${integration}@custom pattern with ${type}-${integration}.package@custom

Would that ultimately lead to the following?

// apm-traces
{
  "pipeline": {
    "name": "global@custom", //newly introduced
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom", //newly introduced
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm.package@custom", //newly introduced
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
  }
},
{
  "pipeline": {
    "name": "traces-apm@custom", // as it pre-existed before 8.12.0
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `traces-apm` dataset"
  }
}

and

// apm-traces.rum
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom", //newly introduced
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm.rum.package@custom", //newly introduced
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `traces-apm.rum` dataset"
  }
},
{
  "pipeline": {
    "name": "traces-apm.rum@custom",  // as it pre-existed before 8.12.0
    "ignore_missing_pipeline": true,
    "description": "[Fleet] (deprecated) Use the `traces-apm-apm.rum` pipeline instead"
  }
}
kpollich commented 7 months ago

@simitt - This is close, but the traces-apm.rum.package@custom you have in your example would actually be traces-apm.package@custom. The intent is to allow users to customize all documents of type traces ingested by the apm package, regardless of dataset.

Here's what the pipeline processors on these data streams look like on my PR branch - https://github.com/elastic/kibana/pull/175448

// traces-apm
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm.package@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
  }
},
{
  "pipeline": {
    "name": "traces-apm@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `apm` dataset"
  }
}
// traces-apm.rum
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm.package@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
  }
},
{
  "pipeline": {
    "name": "traces-apm.rum@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `apm.rum` dataset"
  }
}
// traces-apm.sampled
{
  "pipeline": {
    "name": "global@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Global pipeline for all data streams"
  }
},
{
  "pipeline": {
    "name": "traces@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces`"
  }
},
{
  "pipeline": {
    "name": "traces-apm.package@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
  }
},
{
  "pipeline": {
    "name": "traces-apm.sampled@custom",
    "ignore_missing_pipeline": true,
    "description": "[Fleet] Pipeline for the `apm.sampled` dataset"
  }
}
kpollich commented 7 months ago

FYI with the .package suffix we do incur one new collision with the system_audit integration 😞: https://github.com/elastic/package-spec/issues/699#issuecomment-1908696989.

We could use .integration instead, but it incurs the same opportunity for collision, though we won't have any collision cases today. Maybe that's best to just unblock.

kpollich commented 7 months ago

https://github.com/elastic/kibana/pull/175448 has been updated to use .integration as a suffix instead of .package. See https://github.com/elastic/kibana/pull/175448#issuecomment-1908719348 for a copy/paste of relevant pipelines.

@simitt - I'll hold off on merging until you can take a look at the above and verify this is acceptable from the APM side.

axw commented 7 months ago

In the new apm-data Elasticsearch plugin we have the following logic: https://github.com/elastic/elasticsearch/blob/9b4647cfc6d39987cc3fd4f44514bca403d4808f/x-pack/plugin/apm-data/src/main/resources/ingest-pipelines/apm%40default-pipeline.yaml#L35-L56

That is, we invoke:

So IIUC we should replace that third one with {data_stream.type}-apm.integration@custom to be consistent. Is that right?

axw commented 7 months ago

So IIUC we should replace that third one with {data_stream.type}-apm.integration@custom to be consistent. Is that right?

@simitt and I discussed this just now, and rather than making it consistent I'm going to remove that custom pipeline from the apm-data plugin. The reason is that "integrations" and "packages" no longer make sense, conceptually, when taking Fleet or integrations out of the picture.

simitt commented 7 months ago

@kpollich your proposal looks good from an apm perspective - thanks for finding a non-breaking solution. Going forward, when moving to the apm plugin we will simply not make use of the {data_stream.type}-apm.integration@custom pipeline for apm.

kpollich commented 7 months ago

@kilfoyle - FYI now that this has landed, I'm going to open a docs issue later today with a draft of what we should include under the breaking change section of the 8.12.1 release notes.

kilfoyle commented 7 months ago

@kpollich Sounds good. Thanks so much for writing that up!

kpollich commented 7 months ago

Docs issue: https://github.com/elastic/ingest-docs/issues/861

kpollich commented 7 months ago

@amolnater-qasource - FYI we updated the names of these pipelines. Not sure if this impacts existing test cases but I wanted to flag this issue to you. See relevant PR + documentation issue above as well.

harshitgupta-qasource commented 7 months ago

Hi @kpollich

Thank you for the update.

We have updated 02 testcases for this feature under testrail at links:

We have validated this issue on 8.13.0-SNAPSHOT Kibana build and and had below observations:

Observations:

Build details: VERSION: 8.13.0 SNAPSHOT BUILD: 71179 COMMIT: b4d93fc145c3c09eb1096c610b7cd736f19f6a3a

Screen-Cast:

Further we will revalidate this once latest 8.12.1 BC build is available.

Please let us know if we are missing anything here. Thanks!

amolnater-qasource commented 7 months ago

Hi Team,

We have revalidated these changes on latest 8.12.1 BC1 kibana cloud environment and found it working fine now.

Observations:

Screenshot: image

image

Build details: VERSION: 8.12.1 BC1 BUILD: 70228 COMMIT: 3457f326b763887d154c9da00bd4e489221a2ff3

Hence we are marking this as QA:Validated.

Please let us know if anything else is required from our end. Thanks!

carsonip commented 7 months ago

Testing notes (from APM)

8.12.1 fix is working as expected. I confirm the following ingest pipelines do not exhibit the bug in 8.12.0.

8.12.0

traces-apm.sampled-8.12.0 ingest pipeline

[
  {
    "rename": {
      "field": "observer.id",
      "target_field": "agent.ephemeral_id",
      "ignore_missing": true
    }
  },
  {
    "date": {
      "field": "_ingest.timestamp",
      "formats": [
        "ISO8601"
      ],
      "ignore_failure": true,
      "output_format": "date_time_no_millis",
      "target_field": "event.ingested"
    }
  },
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "traces@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "traces-apm@custom",
      "ignore_missing_pipeline": true
    }
  },
  {
    "pipeline": {
      "name": "traces-apm.sampled@custom",
      "ignore_missing_pipeline": true
    }
  }
]

8.12.1

traces-apm.sampled-8.12.1 ingest pipeline

[
  {
    "rename": {
      "field": "observer.id",
      "target_field": "agent.ephemeral_id",
      "ignore_missing": true
    }
  },
  {
    "date": {
      "field": "_ingest.timestamp",
      "formats": [
        "ISO8601"
      ],
      "ignore_failure": true,
      "output_format": "date_time_no_millis",
      "target_field": "event.ingested"
    }
  },
  {
    "pipeline": {
      "name": "global@custom",
      "ignore_missing_pipeline": true,
      "description": "[Fleet] Global pipeline for all data streams"
    }
  },
  {
    "pipeline": {
      "name": "traces@custom",
      "ignore_missing_pipeline": true,
      "description": "[Fleet] Pipeline for all data streams of type `traces`"
    }
  },
  {
    "pipeline": {
      "name": "traces-apm.integration@custom",
      "ignore_missing_pipeline": true,
      "description": "[Fleet] Pipeline for all data streams of type `traces` defined by the `apm` integration"
    }
  },
  {
    "pipeline": {
      "name": "traces-apm.sampled@custom",
      "ignore_missing_pipeline": true,
      "description": "[Fleet] Pipeline for the `apm.sampled` dataset"
    }
  }
]