clawsoftware / clawPDF

Open Source Virtual (Network) Printer for Windows that allows you to create PDFs, OCR text, and print images, with advanced features usually available only in enterprise solutions.
https://github.com/clawsoftware/clawPDF
GNU Affero General Public License v3.0
729 stars 156 forks source link

Degraded visual quality of text in PDF OCR Output Formats #96

Open bjelliot opened 1 year ago

bjelliot commented 1 year ago

When saving files in PDF/RGB-OCR or other OCR formats, the text quality is severely degraded. Am I doing something wrong? I would like to have a solution which retains searchable text and is visually readable also. I have tried each of the output formats and a variety of other settings but have been unable to identify a solution that works for both.

An example print with clawPDF v0.9.3 is shown below for PDF (on the left) and PDG/RGB-OCR (on the right) with reasonably default settings otherwise. For smaller fonts, the visual degradation is even worse.

clawPDF comparison

Below are the actual files:

These are printed to a clawPDF virtual printer driver (v0.9.3) with Google Chrome. I have tested with multiple webpages, tried out each of the different output formats, and viewed the files in multiple viewers on multiple systems with the same results.

Thank you!

bjelliot commented 1 year ago

If I print the same webpage from Chrome with Save as PDF (on Windows 11), the resulting PDF is both searchable and visually ideal - the searchability of PDF/RGB-OCR and the visual quality of PDF from the clawPDF example.

Save as PDF file:

chmatse commented 10 months ago

I had the same Issue which is gone now. I'm not 100% sure what the reason was...

My attempt was to find the version, where the issue occured for the first time. Hence i uninstalled v0.9.3. When i was in the [Apps and Feautres] Menu i saw an old 0.8.4 Version still there. But i was unable to remove that one. Hence i re-downloaded 0.8.4 and when i started the installer, i choose [Repair].

Afterwards i removed it and then re-installed it with the downloaded installer.

Testings with 0.8.4 created correct PDF Files with no degraded fonts.

After removing 0.8.4 again i downloaded every version until 0.9.3 and did my testings. Since 0.8.5 had updated to Ghostscript 10 i remembered, that i also hat installed GS10 separately. So i removed that also, before installing 0.8.5. Now just installed version 0.9.3 and everything looks okay (also the masive delays when creating a PDF with 0.9.3 was gone).

So my assumption is, that some fragments of the old installation disturbed the correct working of 0.9.3. So if you should have any older version of clawPDF, deinstall EVERY single version (If necessary, download the version and select REPAIR and then uninstall) and (if you have any) deinstall any separately installed Ghostscript.

Maybe this helps?

CyrilWaechter commented 10 months ago

@chmatse I have the same issue with a fresh install and ghostscript was not installed.

zachb3123 commented 5 months ago

same issue on v0.9.3 on Windows 10/11, is it possible to integrate Chromium code print to PDF to it?