ArtifexSoftware / Ghostscript.NET

Ghostscript.NET - managed wrapper around the Ghostscript library (32-bit & 64-bit)
https://ghostscript.com
GNU Affero General Public License v3.0
391 stars 152 forks source link

Ghostscript - completely REMOVE METADATA from pdf files #114

Closed Geo-Van closed 11 months ago

Geo-Van commented 11 months ago

Hello, Please let me ask regarding Ghostscript - completely REMOVE METADATA from pdf files.

(1) With the help of the pdfmark.txt (txt file containing the below info and saved in same directory as the Ghostscript executable (gs.exe), we can use the following command to completely remove any metadata from a pdf file: Command: gs.exe -o output.pdf -sDEVICE=pdfwrite input.pdf pdfmark.txt

Note the pdfmark.txt file content:

[ /Title () /Author () /Subject () /Creator () /ModDate () /Producer () /Keywords () /CreationDate () /DOCINFO pdfmark

Please let me ask: (a) The above method of removing metadata is completely permanent - irreversible? Or is there a way to reverse (get back the original metadata), please? (b) The above method used to work 100% in previous Ghostscript releases including the PRODUCER name. But unfortunately in latest versions, the PRODUCER name can not be changed - ALWAYS writes GHOSTSCRIPT as producer name. It seems that IGNORES the command /Producer () from the pdfmark.txt Is there anything we can do in order to change the PRODUCER name, please?

(2) XMP metadata: How can we also completely remove any XMP metadata from a pdf file, please? I use the following SINGLE pdfmark.txt , but it seems that creates a "new" XMP metadata:

[ /Title () /Author () /Subject () /Creator () /ModDate () /Producer () /Keywords () /CreationDate () /DOCINFO pdfmark

[ /XML () /Ext_Metadata pdfmark

Please let me ask: (a) Is the above pdfmark.txt correct, please? (b) If yes, then why the created new pdf file it seems that creates a "new" XMP metadata?

We are looking forward for your reply. Many thanks!

jhabjan commented 11 months ago

Try to use -dOmitXMP=true

Geo-Van commented 11 months ago

Thanks for your reply, but please note that: -dOmitXMP=true according to documentation, it is required when producing PDF/A output .

But, please note that my question refers to NOT PDF/A output . (sorry i did not mention it before).

So, any more ideas, please?

*By the way is the syntax of pdfmark.txt correct? (INCLUDING ZERO XMP metadata):

[ /Title () /Author () /Subject () /Creator () /ModDate () /Producer () /Keywords () /CreationDate () /DOCINFO pdfmark

[ /XML () /Ext_Metadata pdfmark

jhabjan commented 11 months ago

You cannot remove the XMP metadata from a conforming PDF/A file, it's a requirement of the specification. Other PDF types may be similar, and Ghostscript will, of course, always insert XMP if required by the specification.

You can't touch the Producer key/value pair in the Info dictionary, Ghostscript always write the Producer and it's always set to "Ghostscript". This is to prevent people changing it and passing off Ghostscript produced PDF files as their own work.

NOTE: This is Ghostscript.NET repository, which serves as a .NET wrapper for the core Ghostscript library. If you have questions about the core Ghostscript functionality, I suggest seeking assistance in the #ghostscript Discord channel.

Geo-Van commented 11 months ago

Thank you very much for your reply.

So, if i get it correct:

If i use Ghostscript to REMOVE ALL metadata from a pdf file by using the pdfmark.txt (as described at the end of this comment), XMP metadata will be created inside the new pdf file, but it will be an "EMPTY" XMP metadata ("empty" - means that contains no metadata info - it contains only the metadata XMP structure which is the same for every pdf file). So actually the command: gs.exe -o output.pdf -sDEVICE=pdfwrite input.pdf pdfmark.txt will REMOVE ALL metadata including the XMP metadata. Do i understand it correct, please?

NOTE: The following pdfmark.txt , will remove all metadata but it will create a "new" "EMPTY" XMP metadata: (note that the [ /XML () /Ext_Metadata pdfmark ---> is NOT needed.

[ /Title () /Author () /Subject () /Creator () /ModDate () /Producer () /Keywords () /CreationDate () /DOCINFO pdfmark

Geo-Van commented 11 months ago

Please see https://github.com/ArtifexSoftware/Ghostscript.NET/issues/117