Closed GoogleCodeExporter closed 9 years ago
Delphi 7 and XP. :p
Original comment by rfwo...@gmail.com
on 12 Jan 2011 at 12:25
there is always a debate on c and c++. For DLLs and Static Objects, c is always
preferred because it is faster and more portable. The ideal way to develop a
software is to write in c then wrap it in c++. gtk is always following this
direction while unfortunately tesseract 3 is heading for another.
So, to wrap it to other programming language, say python, one may inevitably be
required to go through some tedious steps to wrap the c++ class in tesseract 3
back to c library. Of which, the especially troublesome class is "STRING"
derived from apache.
rtwoolf> As I have used neither Delphi nor XP, it may take time for me to
explore what is going on. Will keep u inform the progress.
Original comment by FreeT...@gmail.com
on 12 Jan 2011 at 12:48
"So, to wrap it to other programming language, say python, one may inevitably
be required to go through some tedious steps to wrap the c++ class in tesseract
3 back to c library."
I'm a little confused. by the looks of it you got everything working in Python,
but you didn't recompile the DLL in C.
Original comment by rfwo...@gmail.com
on 12 Jan 2011 at 1:00
I'm wondering whether it would be best to include SWIG wrappers for certain
languages (Java, Python, C) with Tesseract itself or whether it would be better
to maintain these separately. I suspect also having to support Java and Python
would complicate the build process. Also, the C version requires a forked
version of SWIG. Any thoughts, Jimmy?
Original comment by JerseyChewi@gmail.com
on 12 Jan 2011 at 2:01
Android already includes JNI bindings for tesseract; using SWIG for it would be
an unneccessary duplication of effort.
I really couldn't care less about Python, so I'm not going to register an
opinion for or against, but I will note that the current trend seems to be to
prefer using ctypes.
The main reason why having a C binding is so appealing is that most other
languages have facilities for using C libraries, which would reduce the effort
in making language bindings all round, so it seemed like having a C wrapper was
the best way in general.
Original comment by joregan
on 12 Jan 2011 at 2:25
True, I'd forgotten about the Android port but I don't think this has been
updated for Tesseract 3?
Original comment by JerseyChewi@gmail.com
on 12 Jan 2011 at 3:04
"The main reason why having a C binding is so appealing is that most other
languages have facilities for using C libraries" - exactly!
Original comment by rfwo...@gmail.com
on 12 Jan 2011 at 3:10
I wasn't questioning whether we should have C bindings at all. It's just that
SWIG supports quite a few languages and we'll get better results for less work
if we target those directly instead of going via what it generates for C.
Original comment by JerseyChewi@gmail.com
on 12 Jan 2011 at 3:17
The Android port was updated to Tesseract 3 quite some time ago. Before
Tesseract 3 was released, in fact.
Original comment by joregan
on 12 Jan 2011 at 3:17
[deleted comment]
Hello, I tried to install the SWIG package and I am not able to build. There
were lot of errors during the build. The last error was:
publictypes.h:96: error: âtesseract::PSM_COUNTâ has a previous declaration as
âtesseract::PageSegMode tesseract::PSM_COUNTâ
error: command 'gcc' failed with exit status 1
I was using the command "python setup.py build". Any of you had a simular
issue? Please advise. Thank you for your time.
Original comment by vijay111...@gmail.com
on 19 Feb 2011 at 7:43
svn changed.
Try http://code.google.com/p/python-tesseract/downloads/list
Will look into it when I am free.
Original comment by FreeT...@gmail.com
on 20 Feb 2011 at 4:55
I have tried the svn version of tesseract-ocr today vs the swig_svn.7z in
http://code.google.com/p/python-tesseract/downloads/list.
python setup.py build don't yield any problem. For your information, I am using
Maverick Ubuntu
Original comment by FreeT...@gmail.com
on 20 Feb 2011 at 5:18
Hello everybody,
I have written a small C Wrapper (not complete but covers the most important
part).
I would like to share it, and ideally it would be included in the project.
It is based on tesseract 3.01, so if there are any major changes in the C++
API, probably it would need some changes.
Comments are welcome!
Original comment by trop...@gmail.com
on 2 Apr 2012 at 8:11
Attachments:
BTW, those files just have to be added to the project alongside baseapi.h/.cpp
Original comment by trop...@gmail.com
on 2 Apr 2012 at 8:25
I put the files in api folder and included them in tesseract project (r639).
However, the compiler generated > 100 errors, most of which are as following:
Error 1 error C2143: syntax error : missing ';' before
'const' c:\projects\tesseract-3.0.1\api\capi.h 96 tesseract
Error 2 error C4430: missing type specifier - int assumed. Note: C++ does not
support default-int c:\projects\tesseract-3.0.1\api\capi.h 96 tesseract
Error 3 error C2144: syntax error : 'void' should be preceded by
';' c:\projects\tesseract-3.0.1\api\capi.h 98 tesseract
Error 5 error C2086: 'int TESSDLL_API' :
redefinition c:\projects\tesseract-3.0.1\api\capi.h 98 tesseract
Original comment by nguyen...@gmail.com
on 3 Apr 2012 at 2:46
Oh that's unfortunate. I made a last minute change without testing it.
Here's the new one.
Anyway, are you compiling the 3.01 project? In 3.01 (Windows) there is not yet
a project for a dll, only the executable.
What I have done is just changed the "tesseract" project from "Executable" to
"DLL" in the preferences and defined TESSDLL_EXPORTS also in the Project
settings.
Additionally you probably want to change the output name from tesseract.exe to
tesseract.dll or similar.
This is obviously a hack, in the long term you would a separate project for the
DLL. But I believe this is already done in SVN.
Original comment by trop...@gmail.com
on 3 Apr 2012 at 7:12
Attachments:
I've been able to build the DLL with both 3.01 and, with little change, 3.02
alpha. However, I'm not sure if it was built correctly as my Java program
cannot look up the exposed C methods. I'll come back to it when I have more
time.
Meanwhile, can you attach a copy of your C DLL so we can try out? Thanks.
Original comment by nguyen...@gmail.com
on 4 Apr 2012 at 2:22
You need to define TESSERACT_EXPORTS in the project properties, otherwise the C
Functions are not exported.
I've attached a copy of my DLL. (One Debug and one Release)
Note that I've built the DLLs with VS 2005, so there is a bit of a dependency
hell regarding MSVCRxx.dll. You need both, MSVRC80.dll (for the VS 2005
compiled objects) and MSVCR90.dll (for the VS 2008 objects that came with the
source).
For the release version, you probably already have those, for the debug version
it's a bit more difficult.
Those files now are Release.Dynamic
Original comment by trop...@gmail.com
on 5 Apr 2012 at 8:28
Attachments:
And now Debug:
(omitted the PDB, its seems to be too large)
Original comment by trop...@gmail.com
on 5 Apr 2012 at 8:30
Attachments:
[deleted comment]
Thanks, Troplin. I tried all your files and suggestions but still nothing
worked. After spending some time digging into the old tessdll source code of
Tess 2.04, I made a single change to capi.h from:
define TESSDLL_CALL __stdcall
to:
define TESSDLL_CALL __cdecl
then I began to be able to call the exported C functions from my Java wrapper.
I don't understand the significance of this change since I'm not a C/C++
developer.
In preliminary tests with Tess 3.02, the OCR output text appeared be accurate
with the test images.
Original comment by nguyen...@gmail.com
on 6 Apr 2012 at 9:15
It is just a different calling convention.
If you want to call the function from Java, you need use the same calling
convention as declared in the C-API.
What technique are you using for your Java Wrapper? JNI, JNA, or something
other?
Usually you can declare the calling convention where you declare the function
prototype.
_stdcall is the standard for all Microsoft Win32 APIs.
_cdecl is the standard for C programs
Both have there advantages and disadvantages, I think it's a matter of taste
what to use.
_cdecl makes less problems in combination with product from Non-Microsoft
vendors (e.g. MinGW, Java, etc)
_stdcall is better suited if you are calling from MS-Products (like .NET, VB6)
Original comment by trop...@gmail.com
on 10 Apr 2012 at 9:34
Which function in capi.h is called to do the OCR? TessBaseAPIProcessPages? Have
u defined TESSDLL_INCLUDE_BASEAPI?
Could u be kind enough to brief me how to make your java wrapper?
Original comment by FreeT...@gmail.com
on 10 Apr 2012 at 1:25
The functions in the C-API are the same as those in the C++ API.
Documentation is in baseapi.h.
I usually do the following sequence:
1. TessBaseAPICreate
2. TessBaseAPIInit3
3. TessBaseAPISetPageSegMode
4. TessBaseAPISetImage
5. TessBaseAPIRecognize
6. TessBaseAPIGetIterator
... (extract text from iterators)
X. TessBaseAPIDelete
Original comment by trop...@gmail.com
on 10 Apr 2012 at 2:55
python-tesseract for windows
http://python-tesseract.googlecode.com/files/python-tesseract-0.7.win32-py2.7.ex
e
Original comment by FreeT...@gmail.com
on 11 Apr 2012 at 6:56
A comment on TESSDLL_INCLUDE_BASEAPI and TESSDLL_INCLUDE_LEPTONICA:
TESSDLL_INCLUDE_BASEAPI:
Only define this, if you are using the C-API in C++.
If defined, all datatypes from the BaseAPI can be used. C and C++ API can be
mixed freely.
TESSDLL_INCLUDE_LEPTONICA:
Enables the use of the Leptonica datatypes.
Original comment by trop...@gmail.com
on 12 Apr 2012 at 9:14
Here is new version of the C Wrapper.
Changes:
- Use __cdecl instead of __stdcall, this seems to be more convenient.
- Includes all functions using Leptonica datatypes per default.
- Forward declaration of Leptonica datatypes instead of header file inclusion.
- Added missing "SetVariable" function
- Use array-delete (delete []) instead of scalar delete for strings and int
arrays.
Original comment by trop...@gmail.com
on 12 Apr 2012 at 1:13
Attachments:
Forget to thank trop for your good works. Late is better than never.
Thanks a lot.
Original comment by FreeT...@gmail.com
on 12 Apr 2012 at 4:41
Feedback anyone?
Are there any chances to integrate it into the main repo?
Any objections regarding the names or the coding style?
Original comment by trop...@gmail.com
on 16 Apr 2012 at 7:55
Troplin, will this C wrapper also work on Linux?
I'm developing a JNA wrapper based on this C API (http://tess4j.sf.net). I'm
close to releasing a beta once I figure out why recognizing a rectangle cuts
off some words at the right edge of the image.
Other than that, the capi.cpp/.h looks fine. IMHO, for it to be included in the
current baseline, it needs to be updated to Tesseract 3.02 API, which changed a
little bit from 3.01. Additionally, a short demo program (similar to dlltest in
2.04) to test the C API would be nice to have.
Original comment by nguyen...@gmail.com
on 16 Apr 2012 at 1:14
Original issue reported on code.google.com by
nguyen...@gmail.com
on 26 Sep 2010 at 4:20