Ignorato / lapdftext

Automatically exported from code.google.com/p/lapdftext
0 stars 0 forks source link

blockifyClassify does not output xml, outputs png and text files on mac #5

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I am on mac mountain lion, I downloaded LA-PDFText_macos_1_7.dmg 

When I run blockifyClassify like so:

./LA-PDFText blockifyClassify "path to folder of pdf files" "absolute path to 
my drool file" "path to folder where I want the output" 

It outputs a lot of png images, and a pdf spatial filtered text for each pdf, 
and an unclassified Flow aware text .dat file. 

I expected open access style xmls like the following: 
ftp://ftp.biomedcentral.com/articles/0778-7367-67-1-1.xml

from the description "obtain XML for each block and classification based on a 
rule file."

I might very well not have a good drool file, if so I would like an error 
report of some sort so that I can improve it.

Original issue reported on code.google.com by rasmusse...@gmail.com on 7 Mar 2013 at 12:44

GoogleCodeExporter commented 8 years ago
I noticed some errors like:

Fatal Error] :1:134: The element type "root" must be terminated by the matching 
end-tag "</root>".

In the output. 

I've attached my example pdf

Original comment by rasmusse...@gmail.com on 8 Mar 2013 at 8:29

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you. We've noticed this bug with a small proportion of papers and will 
try to investigate the underlying cause soon. Sorry that we have not addressed 
this yet.

G.

Original comment by GullyBu...@gmail.com on 8 May 2013 at 5:32