dimi2 / DyAnnotationExtractor

DyAnnotationExtractor is software for extracting annotations (highlighted text and comments) from e-documents like PDF.
Apache License 2.0
37 stars 3 forks source link

Creates only empty md file on Ubuntu #5

Open draunitschke opened 3 years ago

draunitschke commented 3 years ago

Hi, when I run DyAnnotationExtractor.sh it produces an empty md-file.

OS: Ubuntu 18.04.05 LTS Java: openjdk version "11.0.9.1" 2020-11-04

Input-File: Ubuntu 1804 english.pdf

The command I executed: ./DyAnnotationExtractor.sh -input ./documents/Ubuntu\ 1804\ english.pdf

I tried to check if there is something wrong with the annotations. The pdf-highlights-extractor (https://sourceforge.net/projects/pdfhex/) extracts the two highlighted texts.

dimi2 commented 3 years ago

Interesting case - thank you for reporting it. The problem was caused by text content attached to the highlighted area. That text content was made BOM marker (only) and appears like empty string. The problem is now fixed in DyAnnotationExtractor version 1.3.

draunitschke commented 3 years ago

Hi, thanks for the quick reaction! The DyAnnotationExtractor now produces a file with all the text I highlighted, but it is appending all the extracted texts to each other, resulting in one long text.

Output: Attention! This English-language guide includes a separate text file. The text of the guide is numbere and the same numbering is in the text file. The text file is easily translated by Google The programs that a home user needs are email, web browser, pdf file viewer, video an music playback software as well as, office program including spreadsheet, word processing and presentation graphics. Today, cloud services, web calls and other social How to open Ubuntu? To unlock your computer, raise the lock screen curtain by dragging it upward with the cursor, or by pressing Esc or Enter. This will reveal the login screen, where you can enter your password to unlock. Alternatively, just start typing your password and the curtain will be automatically raised as you type. When you lock your screen, or it locks automatically, the lock screen is displayed. In addition to protecting your desktop while you’re away from your computer, the lock screen displays th buntu does not always show the hourglass even though the computer is working. his is a bit embarrassing. Wait patiently and be cautious in such a situation. ometimes the hourglass (or rotating arrow) may be hidden behind the active window

As a user I would like to be able to distinguish between the different extracted highlights.

dimi2 commented 3 years ago

This could be a feature. Probably - combine the highlighted areas which belong to single paragraph, and separate them if they are from different paragraphs. I will think on this.