Open pgayvallet opened 1 week ago
I created a POC (https://github.com/elastic/kibana/pull/193847) to show what the documentation extraction script would be in charge of doing.
What the script does:
I tried with the Kibana 8.15
documentation, which is ~600 files, and the zipped output is around 12mb. I'd say that most of it is coming from the embeddings.
I also tested the semantic search based documentation retrieval, which seems to be doing okay, E.g
search term: 'How to enable TLS for Kibana?'
top 3 results:
- Encrypt TLS communications in Kibana | Kibana Guide [8.15] | Elastic
- Security production considerations | Kibana Guide [8.15] | Elastic
- Mutual TLS authentication between Kibana and Elasticsearch | Kibana Guide [8.15] | Elastic
See the performSemanticSearch
function of the PR for details.
I think we will need to progress on https://github.com/elastic/kibana/issues/193849 before progressing further on the current issue, as we need more clarity on what the exact format will be for our "KB packages" and their documents.
For https://github.com/elastic/kibana/issues/192031, we need to have a CI task or workflow that would
Embedding generation could be done by indexing the documents in some cluster with the fields we want embeddings for as
semantic_text
, wait for the embedding generation to be complete and then re-export the documents for the next steps.The last step is the one that is unclear to me - I'm not sure atm how exactly fleet packages are being built and added to the package registry / images.