galkahana / HummusJS

Node.js module for high performance creation, modification and parsing of PDF files and streams
http://www.pdfhummus.com
Other
1.14k stars 169 forks source link

Is it possible to hide sensitive text on PDF? #328

Open ugur93 opened 5 years ago

ugur93 commented 5 years ago

I want to use HummusJs to hide sensitive content on parts of a PDF file. I am wondering if it is possible to put a black box over a text and make sure that the text is gone from the file?

galkahana commented 5 years ago

yes.

the text extraction example provides code that will have both the text and physical position of it (x,y, width height). This allows you to cover the text with rectangles, using the same measurements, by creating a new version of the pdf with the rectangles using the modification. in fact the example does something very similar to this.

The text, however, is still there, and one could remove those added rectangles, or just use the text parsing code to get it.

To remove the text completely you have to recreate a version of the PDF where any of the text commands for the relevant text to hide, are removed. You could do that with Hummus, but this requires a bit of a more expert knowledge, which will allow you to probe a bit into PDF and know what you are doing. The text extraction code does interpret text commands, so there's this that you can use as an example of how to get to these commands. recreating\ content streams without them should be possible, by just dumping anything but them to a target stream that serves as a replacement. then you want to use hummus.recrypt to completely remove any no longer used stream. lost me here? if so...yeah, that's when it requires an expert.

mohammedabualsoud commented 4 years ago

Ant updates please?