RWAP / PrinterToPDF

Project for converting captured printer data files to PDF format
GNU General Public License v3.0
85 stars 19 forks source link

Think before you print ;-) #15

Open Mike-DE-RE opened 7 years ago

Mike-DE-RE commented 7 years ago

Thinking about fonts I ended up with the fundamental question: "What do you want?"

I am using a small utility called epsonps that is able to translate an epson raw printing to PostScript.

DESCRIPTION The program epsonps converts epson printer codes from an input file file to POSTSCRIPT on standard output. Unknown, ignored or invalid epson codes are printed on standard error output. The program epsonps is an excellent ASCII listing printer. The program epsonps can convert pson LX-800 codes, Epson LQ-800 codes, and IBM text screen dumps.

Here is the original ZIP-Archive:

epsonps.zip

It is a little tricky to extract that!!!

1st you have to unzip the archive.

2nd it will extract two self extracting "shell archives" called epsonps.1 and epsonps.2.

3rd make make them executable (chmod u+x epsonps.*) create two different subdirectories for both shell archives and move each file in one of the different directories. Extract them by running them in their directories (./epsonps.#).

4th pay attention to the fact that epsonps.2 includes a file called epsonps.1 (man page) which is different to the shell archive epsonps.1

5th delete the shell archives (epsonps.1 and epsonps.2) from the subdirectories and merge the files of both subdirectories into one directory.

You are done!

I've created a little DOS text file (ASCII) EPSONPS.TXT

and translated this to PostScript by using

epsonps -o outputfile.ps -ta4-12 inputfile

the output can be translated using

ps2pdf inputfile.ps outputfile.pdf

Result of that is here EPDF-20170512072918.pdf

The result is a text file from which you can extract information by copy&paste.

The parameter -a4-12 converts 12" printing to an A4 page.

Why do I write this?

Because there are two different ways to convert Epson raw printing.

This first way is to meticulous reproduce what an Epson printer would have printed on paper. This would be 1:1 - assuming you have fonts that are identical with what the ROM of a printer contained - and the output would be a graphic.

The second way is to do it as good as possible but to keep text and reproduce it with fonts (scalable vectors, that look like the fonts the ROM of a printer contained but are embedded into the PDF) and attributes. This way has the big advantage that you can extract information from the printout by copy&paste. Hence you are crossing platforms (ASCII to PDF), you must pay attention to convert the ASCII characters to the PDF files the way that they can be extracted again. I've seen legible PDF files from which you could not extract plain text because translate tables were messy.

Ideal would be a command line switch with which you could decide what output you want, meaning you would have to code both ways.

So, what do you want?

Mike-DE-RE commented 7 years ago

This is how it could look like:

s2.pdf

I've created a test page with Word for DOS and converted it with PrinterConvert and epsonps.

The PNG from PrinterConvert was manually made transparent using gimp. The conversion of epsonps was converted to PDF using ps2pdf. The file was created by stamping the transparend PrinterConvert PDF over the epsonps/ps2pdf PDF using pdftk.

As you can see, both tools have a different opinion about the printer margin, but the result gives you a perfect mix of cutable text with graphic.

A better font translation would improve the result. The font used could be embedded to the PDF.

Mike-DE-RE commented 7 years ago

Another effect would be the dramatical reduction of file size. The plain text file of above sample has a size of 3.1KB only, a graphic page 1.6MB. Should the user print 100 text pages in graphic mode the result would be 16MB instead of 31KB.

RWAP commented 7 years ago

Using text within the PDF file should now be easier with the libHaru library.

However, the issue is how best to accomplish mixed text and graphics on the same page.

There is also the issue as to how to handle the font sizes which for printers are generally passed as characters per inch, not in points

Mike-DE-RE commented 7 years ago

a) Mixed text and graphics.

I understand the problem. So far I have no better suggestion but to work on two different levels, one for text and one for graphics. When processing a page, print all text to one level, on graphic commands, ignore the print (print white), just make horizontal/vertical movements. You should end up with a pure text page. Print all graphics to a second level, print transparent when white, on text print just make horizontal/vertical movements (or print white text). At the end you should have a text and a graphic page which are two layers of the same document. Stamping the graphic part over the text part should create the complete document. When there is no graphic page (you can use a flag to determine that) you don't do that and use pure text page.

Advantage: The document will look perfect. You get text that can be cut out of the document.

Disadvantages: Memory usage will increase. Conversion time will increase. File size will not be small when having text and graphics on one page. When using a tool like pdfimages you cannot split multiple graphics from one page.

Next step would be to implement sort of bounding box to separate multiple images printed on one page. But this operation only makes sense to decrease file size. Who would want to re-use images printed on raster devices?

b) Font size

I am quite sure the original EPSON fonts could be re-generated as vector fonts. Hence you have only a limited number of fonts/sizes - see any EPSON ESC/P(2) printer driver or

http://www.epsonservice.ru/products/manuals/100154/ref_g/apbar_1.htm#escp%202%20and%20fx%20modes%20b

This should be manageable.