Once we have a vector database waiting to go, we need to work out how to encode the docs site into it. This will then be used by services like chat and the job generator to add really focused context to prompts
This is quite a computationally expensive step - but we do need to do it to get a nice clean markdown representation of all our docs. Would it be easier to scrape the HTML site at docs.openfn.org instead? I don't think so?
Pull all the parsed .md files into string
Break each .md file up into chunks by section. I think a section is bound by ## and another ## or the end of the document
I don't know if we need to encode any context into the section, like a path?
Embed each section into the database.
It is likely to be several distinct commands: build the doc site, extract the content chunks, and embed the content chunks.
This process all needs to run at build-time, when the Docker image is assembled, so that the database is nicely pre-seeded when it gets deployed.
See https://github.com/OpenFn/apollo/issues/71 for a spec on adding an vector database to Apollo.
Once we have a vector database waiting to go, we need to work out how to encode the docs site into it. This will then be used by services like chat and the job generator to add really focused context to prompts
I think the process is something like this:
##
and another##
or the end of the documentIt is likely to be several distinct commands: build the doc site, extract the content chunks, and embed the content chunks.
This process all needs to run at build-time, when the Docker image is assembled, so that the database is nicely pre-seeded when it gets deployed.