aspiers / book-indices

Indices for music books
29 stars 36 forks source link

Perl book splitter #47

Closed strk closed 2 months ago

strk commented 2 months ago

I couldn't stand having to download dozens of npm packages just to split books. Also this splitter runs in under a second rather than over a minute.

Requires PDF::API2 ( libpdf-api2-perl package under debian )

strk commented 2 months ago

The Real Book 6th edition splits in 0m0.930s with the perl script

I guess I could also update the README to mention this simple script but let's see what @aspiers thinks about it.

strk commented 2 months ago

Compared to the PDF produced by the pdftk tool these ones have some information that makes them slightly bigger:

1 0 obj << /Type /Catalog /PageLayout /SinglePage /PageMode /UseNone /Pages 2 0 R /ViewerPreferences << /NonFullScreenPageMode /UseNone >> >> endobj
2 0 obj << /Type /Pages /Count 1 /Kids [ 5 0 R ] /Resources 3 0 R >> endobj
3 0 obj << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] >> endobj
4 0 obj << /Producer (PDF::API2 2.044 \(linux\)) >> endobj
5 0 obj << /Type /Page /Contents [ 8 0 R ] /MediaBox [ 0 0 612 792 ] /Parent 2 0 R /Resources << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << /CGB 6 0 R >> >> >> endobj
6 0 obj << /Type /XObject /Subtype /Form /BBox [ 0 0 612 792 ] /Filter [ /FlateDecode ] /FormType 1 /Length 9 0 R /Name /CGB /Resources << /ProcSet [ /PDF /Text /ImageB /ImageC /ImageI ] /XObject << /R702 7 0 R >> >> >> stream

Maybe we could streamline them further and inject some info about the index from which they were produced (like reference to the original PDF and page numbers would be nice)

aspiers commented 2 months ago

Thanks for sharing! Cool that you found a way to making splitting really fast.

I would prefer to keep this repository for data only, so that it's agnostic of tools. Personally I already use https://github.com/aspiers/PDFexploder which is linked from this repo's README, but I'd encourage you to create a fresh GH repo with your tool and then submit a PR to this README which links to that. That way you won't need me to review and merge PRs each time you improve your own tool, and it also gives clean separation between data and code.

strk commented 2 months ago

I'm thus closing this PR in favor of GH-48