elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.77k stars 8.18k forks source link

Add support for template construction compatible with kubernetes hints #135624

Closed ChrsMark closed 2 years ago

ChrsMark commented 2 years ago

As discussed at https://github.com/elastic/elastic-agent/issues/613#issuecomment-1165373934, Fleet UI can be enhanced in order to provide specific input templates which will be capable to be enabled and populated by hint's based autodiscovery implemented in kubernetes provider.

Fleet UI should be capable to produce an inputs.d ConfigMap like the following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-agent-standalone-inputs
data:
  redis.yml: |-
    inputs:
     - name: templates.d/redis/0.3.6
       type: redis/metrics
       data_stream.namespace: default
       use_output: default
       streams:
         - data_stream:
             dataset: redis.info
             type: metrics
           metricsets:
           - info
           hosts:
           - "${kubernetes.hints.redis.info.host|'127.0.0.1:6379'}"
           idle_timeout: 20s
           maxconn: 10
           network: tcp
           period: "${kubernetes.hints.redis.info.period|'10s'}"
           condition: ${kubernetes.hints.redis.info.enabled} == true
         - data_stream:
             dataset: redis.key
             type: metrics
           metricsets:
           - key
           hosts:
           - "${kubernetes.hints.redis.key.host|'127.0.0.1'}:${kubernetes.hints.redis.info.port|'6379'}"
           idle_timeout: 20s
           key.patterns:
             - limit: 20
               pattern: '*'
           maxconn: 10
           network: tcp
           period: "${kubernetes.hints.redis.key.period|'10s'}"
           condition: ${kubernetes.hints.redis.key.enabled} == true

So the flow for creating this new ConfigMap is like this:

  1. Fleet retrieves all the available packages/integrations from the Registry.
  2. One by one constructs the config blocks using the default values defined in the package spec. a . For every setting that is a known "hint", populates its value with the hint placeholder/variable like ${kubernetes.hints.redis.info.host}". The fallback of this should be the default value so the final value of the setting is like ${kubernetes.hints.redis.info.host|'127.0.0.1:6379'}". b. for every data_stream in the config block we add the proper condition so as this to be enabled only by the hint mechanism: condition: ${kubernetes.hints.redis.key.enabled} == true

The purpose of this ConfigMap will be to be mounted at elastic-agent-standalone/elastic-agent-standalone-daemonset-configmap.yaml manually as well as to be included in the full manifest that Fleet UI constructs implemented by https://github.com/elastic/kibana/pull/114439.

This is related to https://github.com/elastic/elastic-agent/issues/662.

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

joshdover commented 2 years ago
  • Fleet retrieves all the available packages/integrations from the Registry.

I hope we don't mean every package? This is not going to scale well as we get to 1k+ packages. Fleet can't be downloading all of the packages to produce this, I think we either need:

Another question is where in the UI should this be shown? Only in the standalone agent configuration UI?

ChrsMark commented 2 years ago
  • Fleet retrieves all the available packages/integrations from the Registry.

I hope we don't mean every package? This is not going to scale well as we get to 1k+ packages. Fleet can't be downloading all of the packages to produce this, I think we either need:

How much this information would be in terms of bytes? Note that we only care about the package spec and not the assets at this point. Could a caching mechanism in Kibana help here?

  • The user to select which integrations they may want to use; and/or
  • A default, constrained set of popular packages that we want to support out the box with the ability to add additional packages

I think that would work but would not comply with the goal of the feature which is to provide minimal configuration steps. Imagine that these templates could be completely hidden from the user since those only act as low level implementation detail. However if including all these is proved to not be performant we need to revisit and re-consider this.

Another question is where in the UI should this be shown? Only in the standalone agent configuration UI?

Yes to my mind this should be shown only in the standalone agent configuration UI.

@mlunadia @gizas any thoughts on the above comments/concerns?

kpollich commented 2 years ago

How much this information would be in terms of bytes? Note that we only care about the package spec and not the assets at this point. Could a caching mechanism in Kibana help here?

We have the package registry API that can provide some info, e.g. https://epr.elastic.co/search?experimental=true but this doesn't include detailed information about variables like default values, types, etc. In order to resolve that level of detail, we need to either query for that list linked above -> query the API for each individual package, e.g. https://epr.elastic.co/package/1password/1.4.0/ or we need to download every package.

Fleet does some in-memory caching of downloaded packages to save on repeated downloads of packages, but that doesn't help us in the "cold start" case where we need to download every single package to determine if and how Fleet should be generating Kubernetes templates for them. I agree with @joshdover's points above that we need some way to limit the list of packages we're querying here, either by user input or by a hardcoded allow list.

Packages can easily be several megabytes on average, and if a package ships with prebuilt assets like ML jobs it can be quite a bit larger.

ChrsMark commented 2 years ago

Out of curiosity I drafted the following python script to measure what we are discussing:

get_packages.py ```python import requests import wget import ssl ssl._create_default_https_context = ssl._create_unverified_context res = requests.get('https://epr.elastic.co/search?experimental=true') base_uri = 'https://epr.elastic.co' packages = res.json() for pkg in packages: print(pkg) print("\n") path = pkg.get('download') uri = base_uri + path print(uri) print("\n") wget.download(uri) ```

Running this script from my local machine I manage to download all the packages in less than 2 minutes and the total storage used seems to be 117M.

$ time ./packagesTest/get_packages.py 
....
3.10s user 1.89s system 4% cpu 1:43.66 total
$ ls -l packagesTest | wc -l
     154
$ du -sh packagesTest                                       
117M    packagesTest
Downloaded packages ```console $ ls -lh packagesTest -rw-r--r-- 1 chrismark staff 855K Jul 6 11:40 1password-1.4.0.zip -rw-r--r-- 1 chrismark staff 2.5M Jul 6 11:40 activemq-0.3.0.zip -rw-r--r-- 1 chrismark staff 27K Jul 6 11:40 akamai-1.0.1.zip -rw-r--r-- 1 chrismark staff 1.4M Jul 6 11:40 apache-1.3.5.zip -rw-r--r-- 1 chrismark staff 1.0M Jul 6 11:41 apm-8.3.0.zip -rw-r--r-- 1 chrismark staff 18K Jul 6 11:40 atlassian_bitbucket-1.2.1.zip -rw-r--r-- 1 chrismark staff 23K Jul 6 11:40 atlassian_confluence-1.3.0.zip -rw-r--r-- 1 chrismark staff 22K Jul 6 11:40 atlassian_jira-1.3.0.zip -rw-r--r-- 1 chrismark staff 365K Jul 6 11:40 auditd-3.1.0.zip -rw-r--r-- 1 chrismark staff 823K Jul 6 11:40 auditd_manager-1.0.0.zip -rw-r--r-- 1 chrismark staff 2.1M Jul 6 11:40 auth0-1.0.0.zip -rw-r--r-- 1 chrismark staff 4.9M Jul 6 11:40 aws-1.17.1.zip -rw-r--r-- 1 chrismark staff 15K Jul 6 11:41 aws_logs-0.2.3.zip -rw-r--r-- 1 chrismark staff 348K Jul 6 11:40 awsfargate-0.1.1.zip -rw-r--r-- 1 chrismark staff 818K Jul 6 11:40 azure-1.1.10.zip -rw-r--r-- 1 chrismark staff 488K Jul 6 11:40 azure_application_insights-1.0.1.zip -rw-r--r-- 1 chrismark staff 196K Jul 6 11:40 azure_billing-1.0.1.zip -rw-r--r-- 1 chrismark staff 1.8M Jul 6 11:40 azure_metrics-1.0.5.zip -rw-r--r-- 1 chrismark staff 249K Jul 6 11:40 barracuda-0.9.0.zip -rw-r--r-- 1 chrismark staff 124K Jul 6 11:40 bluecoat-0.8.0.zip -rw-r--r-- 1 chrismark staff 241K Jul 6 11:42 carbon_black_cloud-1.0.3.zip -rw-r--r-- 1 chrismark staff 28K Jul 6 11:42 carbonblack_edr-1.3.0.zip -rw-r--r-- 1 chrismark staff 1.0M Jul 6 11:40 cassandra-1.1.0.zip -rw-r--r-- 1 chrismark staff 123K Jul 6 11:40 cef-2.0.3.zip -rw-r--r-- 1 chrismark staff 91K Jul 6 11:40 checkpoint-1.5.1.zip -rw-r--r-- 1 chrismark staff 1.1M Jul 6 11:40 cisco-0.12.5.zip -rw-r--r-- 1 chrismark staff 825K Jul 6 11:40 cisco_asa-2.4.2.zip -rw-r--r-- 1 chrismark staff 474K Jul 6 11:40 cisco_duo-1.2.4.zip -rw-r--r-- 1 chrismark staff 43K Jul 6 11:40 cisco_ftd-2.2.2.zip -rw-r--r-- 1 chrismark staff 23K Jul 6 11:40 cisco_ios-1.6.0.zip -rw-r--r-- 1 chrismark staff 159K Jul 6 11:40 cisco_ise-0.1.0.zip -rw-r--r-- 1 chrismark staff 1.8M Jul 6 11:40 cisco_meraki-0.5.1.zip -rw-r--r-- 1 chrismark staff 180K Jul 6 11:40 cisco_nexus-0.5.1.zip -rw-r--r-- 1 chrismark staff 228K Jul 6 11:40 cisco_secure_email_gateway-0.1.0.zip -rw-r--r-- 1 chrismark staff 27K Jul 6 11:40 cisco_secure_endpoint-2.4.1.zip -rw-r--r-- 1 chrismark staff 27K Jul 6 11:40 cisco_umbrella-1.0.1.zip -rw-r--r-- 1 chrismark staff 1.6M Jul 6 11:40 cloud_security_posture-0.0.16.zip -rw-r--r-- 1 chrismark staff 947K Jul 6 11:41 cloudflare-2.0.1.zip -rw-r--r-- 1 chrismark staff 285K Jul 6 11:41 cockroachdb-0.2.0.zip -rw-r--r-- 1 chrismark staff 977K Jul 6 11:41 crowdstrike-1.3.4.zip -rw-r--r-- 1 chrismark staff 156K Jul 6 11:41 cyberark-0.4.4.zip -rw-r--r-- 1 chrismark staff 547K Jul 6 11:41 cyberarkpas-2.4.2.zip -rw-r--r-- 1 chrismark staff 123K Jul 6 11:41 cylance-0.8.1.zip -rw-r--r-- 1 chrismark staff 34M Jul 6 11:41 dga-0.0.2.zip -rw-r--r-- 1 chrismark staff 713K Jul 6 11:41 docker-1.2.0.zip -rw-r--r-- 1 chrismark staff 860K Jul 6 11:41 elastic_agent-1.3.3.zip -rw-r--r-- 1 chrismark staff 97K Jul 6 11:41 elasticsearch-0.2.0.zip -rw-r--r-- 1 chrismark staff 222K Jul 6 11:41 endpoint-8.3.0.zip -rw-r--r-- 1 chrismark staff 233K Jul 6 11:41 f5-0.9.0.zip -rw-r--r-- 1 chrismark staff 17K Jul 6 11:41 fim-1.0.0.zip -rw-r--r-- 1 chrismark staff 24K Jul 6 11:41 fireeye-1.4.0.zip -rw-r--r-- 1 chrismark staff 3.7K Jul 6 11:41 fleet_server-1.2.0.zip -rw-r--r-- 1 chrismark staff 402K Jul 6 11:41 fortinet-1.6.2.zip -rw-r--r-- 1 chrismark staff 602K Jul 6 11:41 gcp-1.9.2.zip -rw-r--r-- 1 chrismark staff 11K Jul 6 11:41 gcp_pubsub-1.0.1.zip -rw-r--r-- 1 chrismark staff 402B Jul 6 11:35 get_packages.py -rw-r--r-- 1 chrismark staff 761K Jul 6 11:41 github-1.0.2.zip -rw-r--r-- 1 chrismark staff 95K Jul 6 11:41 google_workspace-1.5.1.zip -rw-r--r-- 1 chrismark staff 236K Jul 6 11:41 haproxy-0.7.0.zip -rw-r--r-- 1 chrismark staff 1.0M Jul 6 11:41 hashicorp_vault-1.4.0.zip -rw-r--r-- 1 chrismark staff 1.1M Jul 6 11:41 hid_bravura_monitor-1.0.3.zip -rw-r--r-- 1 chrismark staff 8.2K Jul 6 11:41 http_endpoint-1.1.0.zip -rw-r--r-- 1 chrismark staff 9.1K Jul 6 11:41 httpjson-1.2.4.zip -rw-r--r-- 1 chrismark staff 1.6M Jul 6 11:41 iis-0.8.0.zip -rw-r--r-- 1 chrismark staff 112K Jul 6 11:41 imperva-0.8.0.zip -rw-r--r-- 1 chrismark staff 159K Jul 6 11:41 infoblox-0.8.0.zip -rw-r--r-- 1 chrismark staff 151K Jul 6 11:41 infoblox_nios-0.1.0.zip -rw-r--r-- 1 chrismark staff 1.4M Jul 6 11:41 iptables-0.10.1.zip -rw-r--r-- 1 chrismark staff 11K Jul 6 11:41 journald-0.0.2.zip -rw-r--r-- 1 chrismark staff 715K Jul 6 11:41 juniper-1.1.0.zip -rw-r--r-- 1 chrismark staff 279K Jul 6 11:41 juniper_junos-0.2.1.zip -rw-r--r-- 1 chrismark staff 381K Jul 6 11:41 juniper_netscreen-0.2.0.zip -rw-r--r-- 1 chrismark staff 68K Jul 6 11:41 juniper_srx-1.3.1.zip -rw-r--r-- 1 chrismark staff 298K Jul 6 11:41 kafka-1.2.2.zip -rw-r--r-- 1 chrismark staff 21K Jul 6 11:41 keycloak-1.3.1.zip -rw-r--r-- 1 chrismark staff 23K Jul 6 11:41 kibana-1.0.2.zip -rw-r--r-- 1 chrismark staff 1.5M Jul 6 11:41 kubernetes-1.21.1.zip -rw-r--r-- 1 chrismark staff 591K Jul 6 11:41 linux-0.6.7.zip -rw-r--r-- 1 chrismark staff 5.6K Jul 6 11:41 log-1.0.0.zip -rw-r--r-- 1 chrismark staff 534K Jul 6 11:41 logstash-1.1.0.zip -rw-r--r-- 1 chrismark staff 22K Jul 6 11:41 m365_defender-1.0.4.zip -rw-r--r-- 1 chrismark staff 16K Jul 6 11:41 mattermost-1.2.0.zip -rw-r--r-- 1 chrismark staff 854K Jul 6 11:41 microsoft-1.1.0.zip -rw-r--r-- 1 chrismark staff 741K Jul 6 11:41 microsoft_defender_endpoint-2.2.1.zip -rw-r--r-- 1 chrismark staff 17K Jul 6 11:41 microsoft_dhcp-1.4.2.zip -rw-r--r-- 1 chrismark staff 1.0M Jul 6 11:41 microsoft_sqlserver-1.1.1.zip -rw-r--r-- 1 chrismark staff 152K Jul 6 11:41 mimecast-1.0.0.zip -rw-r--r-- 1 chrismark staff 48K Jul 6 11:41 modsecurity-1.0.0.zip -rw-r--r-- 1 chrismark staff 176K Jul 6 11:41 mongodb-1.3.1.zip -rw-r--r-- 1 chrismark staff 729K Jul 6 11:41 mysql-1.2.1.zip -rw-r--r-- 1 chrismark staff 21K Jul 6 11:41 mysql_enterprise-1.0.1.zip -rw-r--r-- 1 chrismark staff 1.4M Jul 6 11:41 nats-1.2.0.zip -rw-r--r-- 1 chrismark staff 134K Jul 6 11:41 netflow-2.0.1.zip -rw-r--r-- 1 chrismark staff 122K Jul 6 11:40 netscout-0.8.0.zip -rw-r--r-- 1 chrismark staff 385K Jul 6 11:41 netskope-1.0.1.zip -rw-r--r-- 1 chrismark staff 354K Jul 6 11:41 network_traffic-1.3.1.zip -rw-r--r-- 1 chrismark staff 1.8M Jul 6 11:41 nginx-1.3.1.zip -rw-r--r-- 1 chrismark staff 1.6M Jul 6 11:41 nginx_ingress_controller-1.2.0.zip -rw-r--r-- 1 chrismark staff 707K Jul 6 11:41 o365-1.6.0.zip -rw-r--r-- 1 chrismark staff 464K Jul 6 11:41 okta-1.8.0.zip -rw-r--r-- 1 chrismark staff 18K Jul 6 11:41 oracle-1.0.2.zip -rw-r--r-- 1 chrismark staff 628K Jul 6 11:41 osquery-1.3.0.zip -rw-r--r-- 1 chrismark staff 109K Jul 6 11:41 osquery_manager-1.3.1.zip -rw-r--r-- 1 chrismark staff 1.8M Jul 6 11:41 panw-2.2.2.zip -rw-r--r-- 1 chrismark staff 26K Jul 6 11:41 panw_cortex_xdr-1.2.1.zip -rw-r--r-- 1 chrismark staff 873K Jul 6 11:42 pfsense-1.0.3.zip -rw-r--r-- 1 chrismark staff 676K Jul 6 11:41 postgresql-1.2.0.zip -rw-r--r-- 1 chrismark staff 2.4M Jul 6 11:41 problemchild-0.0.2.zip -rw-r--r-- 1 chrismark staff 713K Jul 6 11:42 prometheus-0.7.0.zip -rw-r--r-- 1 chrismark staff 147K Jul 6 11:42 proofpoint-0.7.0.zip -rw-r--r-- 1 chrismark staff 241K Jul 6 11:42 proofpoint_tap-0.1.0.zip -rw-r--r-- 1 chrismark staff 26K Jul 6 11:42 pulse_connect_secure-1.0.1.zip -rw-r--r-- 1 chrismark staff 24K Jul 6 11:42 qnap_nas-1.2.1.zip -rw-r--r-- 1 chrismark staff 46K Jul 6 11:42 rabbitmq-1.2.0.zip -rw-r--r-- 1 chrismark staff 119K Jul 6 11:42 radware-0.7.0.zip -rw-r--r-- 1 chrismark staff 319K Jul 6 11:42 redis-1.2.0.zip -rw-r--r-- 1 chrismark staff 548K Jul 6 11:41 santa-3.1.0.zip -rw-r--r-- 1 chrismark staff 985K Jul 6 11:42 security_detection_engine-8.1.1.zip -rw-r--r-- 1 chrismark staff 283K Jul 6 11:42 sentinel_one-0.1.0.zip -rw-r--r-- 1 chrismark staff 29K Jul 6 11:42 snort-0.3.1.zip -rw-r--r-- 1 chrismark staff 31K Jul 6 11:42 snyk-1.2.1.zip -rw-r--r-- 1 chrismark staff 192K Jul 6 11:42 sonicwall-0.8.1.zip -rw-r--r-- 1 chrismark staff 658K Jul 6 11:42 sonicwall_firewall-0.1.1.zip -rw-r--r-- 1 chrismark staff 198K Jul 6 11:42 sophos-2.2.2.zip -rw-r--r-- 1 chrismark staff 111K Jul 6 11:42 squid-0.8.0.zip -rw-r--r-- 1 chrismark staff 312K Jul 6 11:42 stan-1.2.0.zip -rw-r--r-- 1 chrismark staff 602K Jul 6 11:42 suricata-2.1.0.zip -rw-r--r-- 1 chrismark staff 280K Jul 6 11:42 symantec-0.1.3.zip -rw-r--r-- 1 chrismark staff 300K Jul 6 11:42 symantec_endpoint-1.0.1.zip -rw-r--r-- 1 chrismark staff 113K Jul 6 11:41 synthetics-0.9.4.zip -rw-r--r-- 1 chrismark staff 1.0M Jul 6 11:42 system-1.16.2.zip -rw-r--r-- 1 chrismark staff 6.6K Jul 6 11:41 tcp-1.1.0.zip -rw-r--r-- 1 chrismark staff 140K Jul 6 11:42 tenable_sc-1.2.2.zip -rw-r--r-- 1 chrismark staff 57K Jul 6 11:40 ti_abusech-1.3.2.zip -rw-r--r-- 1 chrismark staff 320K Jul 6 11:40 ti_anomali-1.3.3.zip -rw-r--r-- 1 chrismark staff 74K Jul 6 11:41 ti_cybersixgill-1.4.1.zip -rw-r--r-- 1 chrismark staff 42K Jul 6 11:41 ti_misp-1.4.1.zip -rw-r--r-- 1 chrismark staff 29K Jul 6 11:40 ti_otx-1.3.2.zip -rw-r--r-- 1 chrismark staff 22K Jul 6 11:42 ti_recordedfuture-1.0.1.zip -rw-r--r-- 1 chrismark staff 31K Jul 6 11:42 ti_threatq-1.3.2.zip -rw-r--r-- 1 chrismark staff 116K Jul 6 11:40 tomcat-1.4.1.zip -rw-r--r-- 1 chrismark staff 560K Jul 6 11:42 traefik-1.2.0.zip -rw-r--r-- 1 chrismark staff 6.5K Jul 6 11:41 udp-1.1.1.zip -rw-r--r-- 1 chrismark staff 1.8M Jul 6 11:42 vsphere-0.1.0.zip -rw-r--r-- 1 chrismark staff 325K Jul 6 11:42 windows-1.12.4.zip -rw-r--r-- 1 chrismark staff 16K Jul 6 11:41 winlog-1.5.2.zip -rw-r--r-- 1 chrismark staff 1.8M Jul 6 11:42 zeek-2.1.0.zip -rw-r--r-- 1 chrismark staff 14K Jul 6 11:42 zerofox-1.3.1.zip -rw-r--r-- 1 chrismark staff 481K Jul 6 11:42 zookeeper-1.2.0.zip -rw-r--r-- 1 chrismark staff 28K Jul 6 11:42 zoom-1.3.1.zip -rw-r--r-- 1 chrismark staff 112K Jul 6 11:42 zscaler-0.5.1.zip -rw-r--r-- 1 chrismark staff 320K Jul 6 11:42 zscaler_zia-2.1.0.zip -rw-r--r-- 1 chrismark staff 370K Jul 6 11:42 zscaler_zpa-1.0.0.zip ```

Are those numbers expected @kpollich ? I wonder if those numbers are actually risky in terms of performance, since this action should take place only once on Kibana's "first" load time and then the constructed ConfigMap can be cached. Would a background job along with the caching help here?

If these indicators are concerning then I would do a step back and re-consider the approach/solution. To the specific comment

I agree with @joshdover's points above that we need some way to limit the list of packages we're querying here, either by user input or by a hardcoded allow list.

how do you think of this selection? Would that mean that by-default we only select some packages but we also provide the option to users to select and download all of them? Wouldn't that lead to the risk of downloading everything again if users are choosing "select all"?

One thing that I would like to make clear here is the purpose of this feature. Hints' based autodiscovery serves for the cases where users want full automation and no "restarts" with as minimal configuration as possible. So having the users to select/deselect the packages to be included makes us diverge from the purpose. In addition to this, if for any reason users want to add sth more they will need to go back to Fleet UI and regenerate the templates and finally restart the Agent. This is more of a hybrid approach and not fully automated based on hints.

Having said this, I think that if Kibana and Fleet UI cannot solve this issue efficiently we need to reconsider.

Some quick alternatives here:

  1. Would that be possible instead of downloading all the artifacts for the packages to only download the packages' spec from https://github.com/elastic/package-storage/tree/production/packages directly?
  2. I would even consider forgetting about supporting this on Fleet UI and implement the template+ConfigMap construction in elastic-package. Then we will have sth like elastic-package createk8sTemplates which would provide us the wanted ConfigMap. With something like this we could even have a nightly job to upload this ConfigMap at https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone if there is any diff. Then only thing that users need to do is to download https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone and deploy, which exactly what they are doing today. Keep in mind that even today the standalone policy we provide is quite static and not frequently updated at https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-standalone/elastic-agent-standalone-daemonset-configmap.yaml#L27, but this is somehow expected when it comes to standalone experience.

cc: @gizas

MichaelKatsoulis commented 2 years ago

I would even consider forgetting about supporting this on Fleet UI and implement the template+ConfigMap construction in elastic-package. Then we will have sth like elastic-package createk8sTemplates which would provide us the wanted ConfigMap.

This is a good idea. Each time there is a new update in one of the packages(in the vars of the data streams?), a new ConfigMap will be constructed and a PR can be opened to Kibana project as well to update the https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/elastic_agent_manifest.ts#L8 which is currently used for the standalone agent. This is not expected to happen very often.

ChrsMark commented 2 years ago

To my mind updating https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/elastic_agent_manifest.ts#L8 is another story that is irrelevant to the templates construction and should be handled on top. For example even today if we change https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone then Kibana's part will be outdated. Based on this maybe updating Kibana's should happen in any case if changes are detected at https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone.

gizas commented 2 years ago

Trying to follow all above and synced with Christos on the small details, and just some clarifications:

mtojek commented 2 years ago

Ok, I see that this thread expanded quickly, so let me clarify few things as Ecosystem owns elastic-package and package-registry.

Would that be possible instead of downloading all the artifacts for the packages to only download the packages' spec from https://github.com/elastic/package-storage/tree/production/packages directly?

package-storage as repository will be deprecated soon. By soon, I mean the end of July/August. We will switch to https://package-storage.elastic.co/ which is based on buckets. I strongly recommend not considering package-storage v1 (Git) as a component.

We have the package registry API that can provide some info, e.g. https://epr.elastic.co/search?experimental=true but this doesn't include detailed information about variables like default values, types, etc. In order to resolve that level of detail, we need to either query for that list linked above -> query the API for each individual package, e.g. https://epr.elastic.co/package/1password/1.4.0/ or we need to download every package.

We don't plan to extend EPR to perform any extra logic apart from serving package indices and redirecting to package-storage to download .zip or static artifacts.

I would add the idea to create the k8stemplate on epr side. As long as epr downloads packages, why can we create the templates there and to be served with another api request?

EPR is intended to be a static component with a simple search facility. We don't aim to put extra processing logic there.

gizas commented 2 years ago

So you dont let us many possibilities there :)

I guess the only 2 final candidates are:

Also @mtojek how about the part:

2. and implement the template+ConfigMap construction in elastic-package. Then we will have sth like elastic-package createk8sTemplates which would provide us the wanted ConfigMap

Can we plan for it? Do you see any issues?

mtojek commented 2 years ago
  1. and implement the template+ConfigMap construction in elastic-package. Then we will have sth like elastic-package createk8sTemplates which would provide us the wanted ConfigMap

It is something I'd like to understand better as elastic-package's actions refer to development lifecycle (build, lint, format, test, stack, etc.). I don't see how the createk8sTemplates action fits there, but maybe we can evaluate/rephrase it.

ChrsMark commented 2 years ago
  1. and implement the template+ConfigMap construction in elastic-package. Then we will have sth like elastic-package createk8sTemplates which would provide us the wanted ConfigMap

It is something I'd like to understand better as elastic-package's actions refer to development lifecycle (build, lint, format, test, stack, etc.). I don't see how the createk8sTemplates action fits there, but maybe we can evaluate/rephrase it.

The goal here is simple. We want to produce a static kubernetes ConfigMap with the templates from the latest versions of packages (for now). In order to make it available to our users we can store it in the upstream repository at https://github.com/elastic/elastic-agent/blob/main/deploy/kubernetes/elastic-agent-standalone/elastic-agent-standalone-daemonset-configmap.yaml. Our official docs at the moment redirect our users to download our proposed manifests from there (see docs).

So a developer from cloudnative team that maintains these manifests mainly would need a tool to automate the construction of this ConfigMap. This is where elastic-package comes into play.

In order to automate this process even more, after we have the tooling implemented we can add it in a nightly automation run which would re-run the elastic-package createk8sTemplates command and will check for diffs with the upstream at https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone opening a PR if there is a diff.

In this way users following our docs will only have to curl -L -O https://raw.githubusercontent.com/elastic/elastic-agent/8.3/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml the same way they do today and the hints feature will be available for them in a transparent way.

To make the proposal more complete, Kibana's side should be synced according to https://raw.githubusercontent.com/elastic/elastic-agent/8.3/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml from time to time in order to have the "hardcoded" manifest (https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/elastic_agent_manifest.ts#L8) up to date. But this is a need that even exists today since the hardcoded manifest is not getting updated if something changes at https://raw.githubusercontent.com/elastic/elastic-agent/8.3/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml. Again the creation and existence of the new templates' ConfigMap is an implementation detail and hidden from the end user.

kpollich commented 2 years ago

I wonder if those numbers are actually risky in terms of performance, since this action should take place only once on Kibana's "first" load time and then the constructed ConfigMap can be cached. Would a background job along with the caching help here?

Fleet's setup process on boot blocks Kibana's healthy status, so adding 2+ minutes of degraded time to Kibana on boot is a nonstarter here. A background job seems like a better fit if we place responsibility for the generation of these ConfigMap objects on Kibana.

To create them on daily basis and keep them somewhere where can be picked from kibana on start

This is a better solution in my mind. If the ConfigMap object is truly a static list of every single package that supports Kubernetes autodiscovery hints, it doesn't seem necessary for Kibana to generate that list "on-demand". I expect the rate of change for these hints to be fairly slow, so handling them through a CI job seems a lot better to me.

how do you think of this selection? Would that mean that by-default we only select some packages but we also provide the option to users to select and download all of them? Wouldn't that lead to the risk of downloading everything again if users are choosing "select all"?

I guess I just don't fully understand the use case here. To me, it seems like there'd be an overwhelming amount of config I might not need in this ConfigMap object. For example if we include k8's hints in 10-15 packages the ConfigMap is going to include definition blocks for each of them. To me, this seems like a lot of noise and area for confusion - but then again I am a total novice with Kubernetes, so my understanding of this use case is limited.

You are correct though. If we allow selection here we still run the risk of downloading all packages in order to resolve the default for each variable.

The solution of a static ConfigMap maintained by CI feels the safest to me.

mtojek commented 2 years ago

Folks, I'm afraid that you're forgetting about the scaling factor. We need to think about the situation where we have 1kk packages. Do we want to keep updating config maps at that scale?

Also, how do you plan to support those config maps if the format depends on the Elastic stack version?

I suggest going back to square one and rethinking the procedure. Generating templates on a nightly basis and introducing coupling between packages and Fleet doesn't sound like a safe choice. What if we start accepting community packages? We won't be able to store information about community pkgs in Kibana.

ChrsMark commented 2 years ago

@kpollich @gizas fyi, we had a chat with @mtojek to make things more clear. What we will be evaluating is implementing the template construction compatible with kubernetes hints in a CI component similarly to what we have buckets' indexing etc.

In that case the ConfigMap with the templates will be constructed asynchronously by a job and will be available to our users through https://github.com/elastic/elastic-agent/tree/main/deploy/kubernetes/elastic-agent-standalone. With that we have the same UX that we have today as described at https://www.elastic.co/guide/en/fleet/current/running-on-kubernetes-standalone.html and Kibana/Fleet are not somehow affected.

We will only support having the "templates" based on latest packages since the logic in standalone is decoupled from packages' updates/versions etc, and we just need input policies that work with the defined Agent. This would mean that for 8.3 we will ship a manifest available at https://raw.githubusercontent.com/elastic/elastic-agent/8.3/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml which will include templates that are compatible with 8.3 version. This is a convention that will handle in time of construction.

Regarding scaling, since we will only be including the "latest" compatible packages at the moment we talk about ~150 package and hence ~150 input templates. While we are scaling we can consider selection options but we don't foresee any crucial blocker here.

Having said this, since we agree on taking the safest approach we can consider this issue as "stalled" for now and close it soon if we have the CI's approach moving forward :).

gizas commented 2 years ago

Thank you! As all teams are unblocked and no issues with performance sure we can go with above: [For all to be synced] proposal is: CI construction of templates and place those under https://raw.githubusercontent.com/elastic/elastic-agent/8.3/deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml

We will need this issue to track any work that might needed in Fleet UI to update manifests etc after templates are done

ChrsMark commented 2 years ago

@gizas https://github.com/elastic/elastic-agent/issues/613 seems completed. Do we still need this one for any reason?