Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files
https://stirlingpdf.com
MIT License
46.08k stars 3.75k forks source link

[Feature request] Improved redaction feature #499

Open ChrLau opened 11 months ago

ChrLau commented 11 months ago

Hi,

I just installed Stirling-PDF (App Version: 0.15.1) via Docker and played around with it features a bit, and.. I absolutely love it! But when using the Auto Redact feature I had some ideas how it could possibly be improved.

The "Auto Redact" feature currently works in a way that the entered words/RegEx are replaced with a solid box and the whole PDF is converted to an image. As to prevent the selection of the text behind the box. While this is a rock-solid approve in terms of security it might make working with redacted PDFs harder as you can't select/copy any text anymore. (Yes, I'm aware the OCR feature exists. 😏 But not everyone uses Stirling or has access to such tools.)

Instead the following alternative redaction solutions came to my mind:

  1. Remove the redacted text, only place the solid box in its place (keep spacing, layout, etc.)
  2. Replace redacted text with the word REDACTED (like in those redacted, official documents shown in TV πŸ˜„), without any box at all.
  3. Combination of 1 and 2: Replace the redacted text with the word REDACTED and additionally place the box in its place

Sadly I don't know enough about the PDF file format to know if this is achievable or to estimate how work-intensive this will be. But nevertheless I wanted to share my idea.

Thanks!

Frooodle commented 11 months ago

I wish I could do this and it is certainly a good idea!

But text editting with pdfs is hard, it has proven quite difficult to edit remove or replace text within PDFs and it's why despite all the features of Stirling you still can't just edit text

Frooodle commented 11 months ago

I will keep this issue ticket open however as I do want to give it some more tries in future

ChrLau commented 11 months ago

Yai! That's all I asked for. πŸ˜„

HugoFollic commented 7 months ago

Hi, any update on this feature ?

youcefs21 commented 5 months ago

Hey, I know this isn't a python project, but I know PyMuPDF can do this. Here is a code snippet:

import fitz
doc = fitz.open("invoice-7-2024-06-01.pdf")

for i in range(doc.page_count):
    print("Processing page %i" % i)
    page = doc.load_page(i)

    draft = page.search_for("a word to redact")

    for rect in draft:
        annot = page.add_redact_annot(rect)
        page.apply_redactions()
        page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
# then save the doc to a new PDF:
doc.save("new.pdf", garbage=3, deflate=True)

It redacts the text while maintaining the rest of the pdf as is