JoshData / pdf-redactor

A general purpose PDF text-layer redaction tool for Python 2/3.
Creative Commons Zero v1.0 Universal
184 stars 61 forks source link

cannot replace to korean? #29

Open kdrkdrkdr opened 3 years ago

kdrkdrkdr commented 3 years ago

I want to translate the pdf from Japanese to Korean. I'd like to replace the text with a translated one. It's gone as soon as I fix it's gone.. Attached is the pdf used.

My Code

import re from datetime import datetime

import pdf_redactor

options = pdf_redactor.RedactorOptions()

options.input_stream = r'.\filetest\test.pdf' options.output_stream = r'.\filetest\test_transed.pdf'

options.content_filters = [

(
    re.compile(u"論文の書き方ガイド"),
    lambda m : u'테스트'
),

]

pdf_redactor.redactor(options) test.pdf

JoshData commented 3 years ago

It's gone as soon as I fix it's gone..

I'm sorry but I have no idea what the problem is that you're reporting.

kdrkdrkdr commented 3 years ago

In short, the text disappears when I try to change it..

JoshData commented 3 years ago

Is your problem described by this section in the README?

Since redaction in the text layer works by performing simple text substitution in the text stream, you may create replacement text that contains characters that were not previously in the PDF. Those characters simply won't show up when the PDF is viewed because the PDF didn't contain any information about how to display them.