elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.75k stars 8.15k forks source link

[KB] Add documentation packaging workflow #193473

Open pgayvallet opened 1 week ago

pgayvallet commented 1 week ago

For https://github.com/elastic/kibana/issues/192031, we need to have a CI task or workflow that would

Embedding generation could be done by indexing the documents in some cluster with the fields we want embeddings for as semantic_text, wait for the embedding generation to be complete and then re-export the documents for the next steps.

The last step is the one that is unclear to me - I'm not sure atm how exactly fleet packages are being built and added to the package registry / images.

pgayvallet commented 3 days ago

I created a POC (https://github.com/elastic/kibana/pull/193847) to show what the documentation extraction script would be in charge of doing.

What the script does:

I tried with the Kibana 8.15 documentation, which is ~600 files, and the zipped output is around 12mb. I'd say that most of it is coming from the embeddings.

I also tested the semantic search based documentation retrieval, which seems to be doing okay, E.g

search term: 'How to enable TLS for Kibana?'

top 3 results:
- Encrypt TLS communications in Kibana | Kibana Guide [8.15] | Elastic
- Security production considerations | Kibana Guide [8.15] | Elastic
- Mutual TLS authentication between Kibana and Elasticsearch | Kibana Guide [8.15] | Elastic

See the performSemanticSearch function of the PR for details.

pgayvallet commented 3 days ago

I think we will need to progress on https://github.com/elastic/kibana/issues/193849 before progressing further on the current issue, as we need more clarity on what the exact format will be for our "KB packages" and their documents.