ciur / papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)
https://papermerge.com
Apache License 2.0
2.52k stars 266 forks source link

Feature request: Handle bulk scans seperated by QR/bar codes #130

Open patrk opened 4 years ago

patrk commented 4 years ago

Hello, I am currently giving this project a try to finally manage all my personal paper stack. Since I have to deal with a huge amount of accumulated documents and correspondence I would like to make use of bulk scans with separating pages.

It would be nice if the importer can handle those bulk scans and split the documents accordingly.

Might also contribute to the project after I made some progress importing the most important files into my papermerge instance :)

ciur commented 4 years ago

Hi @patrk, thank you for considering Papermerge.

Can you please be more specific about what is QR/bar codes role in separating pages? A more detailed description of your use-case/context will help me to understand your request.

At this point you can bulk scan documents and later move pages around from one document to another (cut/paste). This is exactly how I use Papermerge (most of the time I bulk/batch scan). Page management feature is described in documention here. There is screencast demo of this feature as well. You can see a batch scan example in this screencast demo as well.

patrk commented 4 years ago

Hello,

thanks for the quick response. I am glad that Papermerge already offers the functionality to edit and split documents by pages.

However, I have hundreds of letters and documents to scan and my batch sizes are around 50-100 pages per scan.

Currently, I print separator pages before each document consisting of a QR code. I have a python script which postprocesses such batch scan by removing the separator page and beginning a new PDF files. Doing this procedure manually is tedious and time consuming.

Therefore having that integrated in Papermerge would save me some time doing the processing myself. Perhaps you could even allow some processing API, where one would simply call their own scripts.

jpguyon52 commented 4 years ago

I have the exact requirements as patrk is having. I need to import more than 500 different documents (7 years on keeping) and would like to import them. A page separator with automation would help me import those documents easily on a fast scanner and the automation would split the documents when it see a separator.

patrk commented 4 years ago

@jpguyon52 Glad to hear that someone is having the same struggle in the transition to a paperless document archive. I might make a pull request in case the contributors are not prioritizing this feature.

ciur commented 4 years ago

ah, now I understand :thumbsup: the whole picture and I fully agree that feature makes perfect sense.

The Paperless project has scripts feature though - which allows you to write scripts executed at various stages of consumption process. Papermerge on the other hand does not have that.