Open pepijnolivier opened 1 month ago
I think it would be more suitable for this to be 2 separate steps. First, extract the page numbers from the toc and then split it using "Split PDF". For extracting the page numbers, maybe we could have a feature that runs a regex on the text of some page number(s), and outputs that. Could include some common expressions as well to make it easier.
For PDFs with predefined outlines, check this draft: https://github.com/Stirling-Tools/Stirling-PDF/pull/1786
Feature Description
0. Table of contents
,1. Introduction
levels: 1
would only split top-level chapters,levels: 2
would split subchapters as well, eg1.1. Introduction - Installation
,1.2 Introduction - Getting started
)Why is this feature valuable?
This could be useful for many purposes:
Suggested Implementation
Additional Information
To be tested on huge and official documents
No Duplicate of the Feature