Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files
https://stirlingpdf.com
MIT License
46.22k stars 3.76k forks source link

[Feature Request] - PDF File Separator #235

Closed nodecentral closed 1 year ago

nodecentral commented 1 year ago

Hi @Frooodle

Sorry if this has been requested before, i did do a quick search but couldn’t see anything

Situation I will often get a build up of various documents that i’ll need to scan in and sort out, but i have to do them in batches in order to create separate pdf files, which can take quite a bit of time.

Ask I was wondering if it was possible to have something that could allow me to scan everything in all in one go, which would create a single large pdf (= pdf123.pdf), but then have a way in Stirling PDF to read it and split it out into the require batches (resulting in multiple files = pdf123-a.pdf, pdf123-b.pdf etc.)

References The following explains how barcode pages are used to do this, - https://www.scantastik.com/hardware/kodak-scanners/bar-codes.html - but maybe Stirling PDF could do something similar / perhaps more simplified / intelligent :-)

Frooodle commented 1 year ago

You might have to give more info on your usecase

Stirling pdf can split documents Are you wanting some folder watch for this or via website... and how are you meaning to pass what page to split at in your example usecase?

nodecentral commented 1 year ago

You might have to give more info on your usecase

Let’s say I have a PDF file that is made up of various different bills, statements, letters etc. Could Stirling PDF be set up to detect where a bank statement page then becomes the letter I received from the doctor ?

I expect this is not easy to do, even if you had AI, but I was wondering what if was to add code or stamp somewhere on the page where I wanted a split to occur? As thats likely not easy to do, then when I scan all my mail in, what if i scanned in a special divider page that Stirling PDF would look for to know that’s the end of a batch of pages I want extracted as a separate file..

Does that help ?

Frooodle commented 1 year ago

Divider page would help I know people have thought about split on blank page before as example

Detecting the difference in document would be hard I can't think of any nice way to do this even with computer vision unless it was very obvious, say going from a4 to a5 or paper colour etc

Keep in mind since you're scanning here all logic we would use is realistically image detection so it's a bit more advanced

Frooodle commented 1 year ago

I can look into doing a divider page detection Maybe even have a custom divider page option you can print out (some qr code) So that it's super easy to detect so process of detection and split is super fast

nodecentral commented 1 year ago

I can look into doing a divider page detection Maybe even have a custom divider page option you can print out (some qr code) So that it's super easy to detect so process of detection and split is super fast

That would be amazing, admittedly being able to detect something on the page would be ideal, just to avoid having to print out loads of divider pages - but, i can see how the divider page route would be much much easier to do, and potentially a chance for a bit of Stirling PDF marketing :-) defer to you but can’t wait to test it out … 👍

Frooodle commented 1 year ago

I don't see how I could detect different content otherwise Would be far too complex with huge false positives

I like the idea of the marketing though :')

Frooodle commented 1 year ago

Done in V0.11.0

image image image

nodecentral commented 1 year ago

Hi @Frooodle , thanks so much for releasing this capability, I’ve just had a quick go and noticed something interesting (certainly for me with a scanner that scans both sides)..

While it seems to find and remove the barcoded pages (great), in some cases it does not always remove the following (back of that) page when it’s scanned. Now, I’m assuming this is because it expects that back of your barcoded page to be completely blank, whereas in my case on a few of my print outs you can see a faint outline of the barcode on the back of the page. (See image below)

IMG_3502

Frooodle commented 1 year ago

Ahh it's expecting single side scanning not double! I will add a setting to choose between single and double it will skip the barcode page and the page after it as well if double

Will that work for you?

nodecentral commented 1 year ago

Many Thanks @Frooodle

I would expect most people who do batch scanning will have a duplex scanner, (as you can just let it run) otherwise you will have to feed each page in one at a time and then flip and repeat ;-) - so if you are looking for a default i’d propose using duplex as that..

Hold on - Maybe another (better) option is for me to print the barcode on both the front and back page, that way Stirling-PDF can be more decisive about what pages are separators ? Would that cause any issues if you see two barcode pages back to back ?

nodecentral commented 1 year ago

@Frooodle - quick question, does it care what size the separator QR-Code is, can i resize the image ?

Frooodle commented 1 year ago

Not sure if it would cause issue but i should probably code for that usecase to be safe anyway... I'll make sure it supports that. I think adding a duplex mode makes sense anyway for people that dont want to print double sided

As for QR code size it shouldnt matter much as long as its visible. you are welcome to change it i attched .odt file Copy of Stirling PDF Auto Splitter page divider.odt

Frooodle commented 1 year ago

Ill have a fix for both the above tomorrow :) fix already in alpha tag if you want to test

just need to add translations and some other chnages before release

Frooodle commented 1 year ago

Try latest version Let me know how it goes Should support double sided and single sided with duplex mode tickbox

nodecentral commented 1 year ago

Works great ! Thanks @Frooodle