Closed strk closed 2 months ago
The Real Book 6th edition splits in 0m0.930s with the perl script
I guess I could also update the README to mention this simple script but let's see what @aspiers thinks about it.
Compared to the PDF produced by the pdftk tool these ones have some information that makes them slightly bigger:
1 0 obj << /Type /Catalog /PageLayout /SinglePage /PageMode /UseNone /Pages 2 0 R /ViewerPreferences << /NonFullScreenPageMode /UseNone >> >> endobj
2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] /Resources 3 0 R >> endobj
3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj
4 0 obj << /Producer (PDF::API2 2.044 \(linux\)) >> endobj
5 0 obj << /Type /Page /Contents [ 8 0 R ] /MediaBox [ 0 0 612 792 ] /Parent 2 0 R /Resources << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << /CGB 6 0 R >> >> >> endobj
6 0 obj << /Type /XObject /Subtype /Form /BBox [ 0 0 612 792 ] /Filter [ /FlateDecode ] /FormType 1 /Length 9 0 R /Name /CGB /Resources << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << /R702 7 0 R >> >> >> stream
Maybe we could streamline them further and inject some info about the index from which they were produced (like reference to the original PDF and page numbers would be nice)
Thanks for sharing! Cool that you found a way to making splitting really fast.
I would prefer to keep this repository for data only, so that it's agnostic of tools. Personally I already use https://github.com/aspiers/PDFexploder which is linked from this repo's README, but I'd encourage you to create a fresh GH repo with your tool and then submit a PR to this README which links to that. That way you won't need me to review and merge PRs each time you improve your own tool, and it also gives clean separation between data and code.
I'm thus closing this PR in favor of GH-48
I couldn't stand having to download dozens of npm packages just to split books. Also this splitter runs in under a second rather than over a minute.
Requires PDF::API2 ( libpdf-api2-perl package under debian )