blindpandas / bookworm

The Universally Accessible document Reader
https://getbookworm.com
Other
120 stars 38 forks source link

The ability to save the scanned OCR book as a PDF or Word document, not just a text file #111

Open DraganRatkovich opened 2 years ago

DraganRatkovich commented 2 years ago

Bookworm currently allows the user to save a scanned book as a plain text file, which is inconvenient in some cases, as either Word document or pdf file formats are currently widely used.

Describe alternatives you've considered

Allow the user to save the scanned book in either pdf format or Microsoft Word document format, which, in turn will give more options in the resulting file for editing in word processing programs. This can be done in the following ways:

@mush42 Let me know your thoughts whether this is possible or not.

mush42 commented 2 years ago

@DraganRatkovich It is possible, of course. But I couldn't see any benefit of those two formats over plain text. No structure information is extracted from the document, except pages and lines. No headings, no paragraphs, and no formatting information. You can copy the text from the text file and paste it in word, and word will restore paging and lines. Best Musharraf

DraganRatkovich commented 2 years ago

@mush42 Of course, but the main advantage of direct saving as pdf or docx is time. It may take a long time to process in Microsoft Word the contents of the extracted text file, especially if the book being scanned contains more than 300 pages.

pauliyobo commented 9 months ago

Hello. One year later. Is this feature still desired? If yes, @DraganRatkovich , would you mind explaining why? I did read the previous comment, however note that even if we did save the txt into a PDF you would not retain any structure from the original image. Iirc, what you get now in the scanned file's output is at most the page number. Is that correct?