Open ChrLau opened 11 months ago
I wish I could do this and it is certainly a good idea!
But text editting with pdfs is hard, it has proven quite difficult to edit remove or replace text within PDFs and it's why despite all the features of Stirling you still can't just edit text
I will keep this issue ticket open however as I do want to give it some more tries in future
Yai! That's all I asked for. π
Hi, any update on this feature ?
Hey, I know this isn't a python project, but I know PyMuPDF can do this. Here is a code snippet:
import fitz
doc = fitz.open("invoice-7-2024-06-01.pdf")
for i in range(doc.page_count):
print("Processing page %i" % i)
page = doc.load_page(i)
draft = page.search_for("a word to redact")
for rect in draft:
annot = page.add_redact_annot(rect)
page.apply_redactions()
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
# then save the doc to a new PDF:
doc.save("new.pdf", garbage=3, deflate=True)
It redacts the text while maintaining the rest of the pdf as is
Hi,
I just installed Stirling-PDF (App Version: 0.15.1) via Docker and played around with it features a bit, and.. I absolutely love it! But when using the Auto Redact feature I had some ideas how it could possibly be improved.
The "Auto Redact" feature currently works in a way that the entered words/RegEx are replaced with a solid box and the whole PDF is converted to an image. As to prevent the selection of the text behind the box. While this is a rock-solid approve in terms of security it might make working with redacted PDFs harder as you can't select/copy any text anymore. (Yes, I'm aware the OCR feature exists. π But not everyone uses Stirling or has access to such tools.)
Instead the following alternative redaction solutions came to my mind:
Sadly I don't know enough about the PDF file format to know if this is achievable or to estimate how work-intensive this will be. But nevertheless I wanted to share my idea.
Thanks!