cozydev-pink / protosearch

prototype search library in pure scala
https://cozydev-pink.github.io/protosearch/
Apache License 2.0
9 stars 6 forks source link

Render "sub-documents" to plaintext with Laika #46

Closed valencik closed 1 year ago

valencik commented 1 year ago

We want to extend the IngestMarkdown work to emit "sub-documents" in plaintext.

Each document should be broken up into sub-documents based on their h1,h2 headers. A sub-document should perhaps be modeled something like:

case class SubDocument(anchor: String, title: String, content: String)

Where content should be a plaintext rendering of the content within that sub-document. The anchor should be the anchor tag to that section of the doc. And the title should be just the content of the h2 header.

Related laika docs for custom rendering: https://planet42.github.io/Laika/latest/05-extending-laika/07-new-markup-output-formats.html#implementing-a-render-format

valencik commented 1 year ago

This work was completed in https://github.com/cozydev-pink/protosearch/pull/50 It is perhaps still prototype-ish, but that's fitting with the project :)