DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.48k stars 507 forks source link

export_to_markdown page separator #359

Closed GermeauSimon closed 3 days ago

GermeauSimon commented 4 days ago

Markdown does not have the concept of a "page" but for further processing of the output I need to be able to split the output of export_to_markdown per page. Is it possible to add a parameter to this function that lets you specify a page delimiter? I know there is a delim parameter but this is a delimiter between every item in the doc.

dolfim-ibm commented 3 days ago

We usually suggest to first use the document as DoclingDocument, where you can access all details. Any export (markdown, etc) will always be lossy.

We are anyway adding markers for pages in the markdown export. This is tracked in #309, so we will close this one and move other discussions there.