abbyy / ocrsdk.com

ABBYY Cloud OCR SDK
http://ocrsdk.com/github
Apache License 2.0
504 stars 483 forks source link

How to process a txtUnstructured image/document with paragraphAsOneLine? #94

Open felipedaraujo opened 4 years ago

felipedaraujo commented 4 years ago

I successfully run the JavaScript example in this repo and now I trying to use the parameter txtUnstructured:paragraphAsOneLine, but so far I haven't had any luck.

After line 87 I tried all the options below and none of them worked for me. Could you guide me on how to use this parameter in the correct way?

settings.language = "English"; // Can be comma-separated list, e.g. "German,French".
settings.exportFormat = "txtUnstructured";

// Alternative 1 - Didn't work
// settings["txtUnstructured:paragraphAsOneLine"] = "true";

// Alternative 2 - Didn't work
// settings["txtUnstructured:paragraphAsOneLine"] = true;

// Alternative 3 - Didn't work
// settings.txtUnstructured = { paragraphAsOneLine: true };

// Alternative 4 - Didn't work
// settings.txtUnstructured = { paragraphAsOneLine: "true" };

// Alternative 5 - Didn't work
// settings.paragraphAsOneLine = "true";

// Alternative 6 - Didn't work
// settings.paragraphAsOneLine = true;

https://cloud-westus.ocrsdk.com is the service target I am using.

My ultimate goal is to parse a PDF to txt the same way finereaderonline.com does, converting multiple columns to a single column and ignoring footers/page numbers.

Thanks in advance.