I have not run the code, i just looked at the sample input and output files provided in the readme on the front page of this git.
Why does the input file go from 82k to 643k? Adding an OCR layer should not cause the file size to increase by almost 800%!
Taking a closer look the file itself is being altered, which i feel is unacceptable. All that should happen when creating a searchable pdf is adding a transparent text layer to the original pdf.
root@debian-test:~# pdfimages -list SampleInput.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 1267 793 rgb 3 8 image no 6 0 72 72 80.3K 2.7%
1 1 smask 1267 793 gray 1 8 image no 6 0 72 72 996B 0.1%
root@debian-test:~# pdfimages -list SampleOutput.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 5279 3304 rgb 3 8 jpeg no 6 0 72 72 642K 1.3%
I have not run the code, i just looked at the sample input and output files provided in the readme on the front page of this git.
Why does the input file go from 82k to 643k? Adding an OCR layer should not cause the file size to increase by almost 800%!
Taking a closer look the file itself is being altered, which i feel is unacceptable. All that should happen when creating a searchable pdf is adding a transparent text layer to the original pdf.