JakubMelka / PDF4QT

Open source PDF editor.
https://jakubmelka.github.io/
GNU Lesser General Public License v3.0
688 stars 70 forks source link

Can this program have redaction and sanitization capabilities? #13

Closed ghost closed 2 years ago

ghost commented 2 years ago

I feel like something that's missing from this program is ability to redact (permanently remove and cover with black) images and/or text along with ability to sanitize the document for anything personal. Adobe Acrobat will remove ALL annotations along with any potential personal info and strip out any javascript making it safe to share with someone else when sanitizing a PDF. Could you consider implementing that? Thanks. Also, I'm very pleased to witness that your PDF reader actually removes annotations rather than keeping them in like poppler and mupdf do. I'll definitely recommend this program if anyone talks about PDF readers.

JakubMelka commented 2 years ago

Hello, I am happy you are interested in my software. Redaction is already implemented in "Redact" plugin. Redact plugin can remove text and image content (and redacted content is covered in black). To turn on Redact plugin, go to Options, section Plugins and check the Redact plugin. See image: obrazek

Redact also sanitizes document (it should remove all annotations / javascript). However, I would like to implement sanitization of the document without redacting it.

ghost commented 2 years ago

@JakubMelka Awesome man. If I may ask another question since I rather not make a new thread, are you going to implement being able to extract pages/move pages around/delete pages/add new pages?

JakubMelka commented 2 years ago

This is implemented in separate application - PDF4QT DocPage Organizer. It allows you (among other things) to extract/move/delete/add pages of the PDF document. Also, several PDF documents can be merged into single one, or perform inverse operation: to split single PDF document into several PDF documents.

Small introduction: DocPage Organizer works with workspace containing page groups. When you add document to the workspace, one single page group of whole document pages is being created. You can split this group to single pages, extract/delete/add pages, and move pages via drag and drop.

Then, you can perform three different actions to produce new PDF documents via menu "Make":

See sample screenshot with two documents, single page and image: obrazek

ghost commented 2 years ago

Can I extract text and images from a page using DocPage Organizer @JakubMelka ?

JakubMelka commented 2 years ago

DocPage Organizer is basically a multiple document page manipulator, it works only with pages. However, to extract text / images, run Viewer Profi, and there you will find two tools. One starts tool with text selection (and you can extract text via Ctrl+C), the other tool copies image to clipboard via click on image. Both tools respect DRM settings of the document, so to perform this extraction, you must have appropriate rights. Also, you can take screenshot of the page.

You can use these icons: obrazek

Alternatively, you can use pdftool, which has two commands to extract content:

JakubMelka commented 2 years ago

I think I can close this issue, as requested functionality is implemented.