AmitGorvadiya / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

tesseract ported to use libtool, shared libraries on many platforms, solution to paths problem #174

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. running ./configure 

What is the expected output? What do you see instead?
tesseract 

What version of the product are you using? On what operating system?
tesseract subversion revision 201 on an Intel Macbook running Mac OS X
10.5.5 using Macports, a Pentium 4 PC running Ubuntu 8.10, and the same PC
running Windows XP using Cygwin.

Please provide any additional information below.

As I have been explaining in the ocropus issue reports and discussion area,
I have ported all of leptonlib-1.58, OpenFst (20080422), tesseract
subversion revision 201, iulib subversion revision 117, and ocropus
subversion revision 1307 to use libtool to generate shared as well as
static libraries, including on Cygwin where Cygwin .dlls are generated.

(I'm seeing below a message "Issue attachment storage quota exceeded"--does
that apply to me or to this issues page?)

The patches can be found in the ocropus files area:

http://groups.google.com/group/ocropus/files?hl=en

They begin with "toautotools".  For the current date 20081127, I have
tested installing the external packages and packages through ocropus on Mac
OS X and Ubuntu 8.10.

To apply a patch, suppose that tesseract has been unpacked in some parent
directory, say bbuild.  Put the patches in bbuild.  Suppose one wants to
install in a subdirectory busr of one's home directory.

cd tesseract
patch -p1 -E < ../toautotools_tesseract-ocr_201_20081127.diff
autoreconf --install --force
./configure --prefix=$HOME/busr CPPFLAGS="-I$HOME/busr/include"
LDFLAGS="-L$HOME/busr/lib"
make
make install

One might want to install leptonlib-1.58 previously using the same
procedure.  Some features/bugs of the patches:

1) Libtool libraries, shared libraries including on Cygwin
2) Fixed paths so that can install anywhere including in one's home
directory (solves Issue 132).  No need to ever use administrator
privileges.  Now all a user has to do is specify the directories containing
the include directories in CPPFLAGS and the directories including the lib
directories in LDFLAGS.
3) All libraries are now built as libtool convenience libraries except for
libtesseract_full.la and libtesseract_training.la.
4) ocroscript from ocropus can run its test-tess.lua script successfully,
but for the linerec bigtest there are failures for tesseract (and bpnet).
5) I have done absolutely no testing with libtesseract_training.la or the
training programs, so I have no idea if they work or not.
6) An earlier version of the patch compiles and runs tesseract on FreeBSD
7.0 stable.  The -E flag is to ensure that files deleted are deleted after
patching.
7) I removed Makefile.in from the top-level directory because it appears to
drastically affect compilation on Cygwin.  Actually it seems to downgrade
the automake version used to its 1.4 version which then breaks automake
conditionals.
8) I know others have submitted patches for ports, but I independently came
up with mine porting to all of Mac OS X 10.5.5, Ubuntu 8.10, Cygwin, and
FreeBSD.
9) Leptonica changed its includes header path to <liblept/allheaders.h>: it
used to have leptonica not liblept I think.  I fixed other places that used
uint32 to leptonica's new l_uint32, etc.

Original issue reported on code.google.com by davs...@gmail.com on 28 Nov 2008 at 8:17

GoogleCodeExporter commented 9 years ago
It's a shame this wasn't merged in at the time. I'm trying to get Tesjeract 
(Java 
JNI bindings) working for Linux and it's going well but having to link 
statically is 
proving problematic. That aside, shared libraries and autotools is just a good 
idea 
in general. I've attempted to apply and fix up the patch but it's not much fun 
and 
I'm not an autotools expert. I'd appreciate it if someone with better knowledge 
could have a go and, more importantly, if any such work could be merged before 
the 
3.00 release.

Original comment by JerseyChewi@gmail.com on 8 Nov 2009 at 10:48

GoogleCodeExporter commented 9 years ago
I couldn't quite get my head around the patch so I decided to just take the 
stuff 
from SVN and work the rest out myself. It builds fine under Linux but I wanted 
to do 
better than only targeting one OS so I gave it a try in Cygwin. Although the 
static 
libraries built on the first attempt, the shared ones required more work 
because 
Windows cannot have any unresolved symbols in its DLLs. I had to use the -no-
undefined flag in order to get it to create any shared libraries at all but 
eventually, I hit a wall.

There is a cyclic dependency between ccmain and textord. These therefore cannot 
be 
built as shared libraries on Windows. I think someone with a better 
understanding of 
tesseract should resolve this. This wasn't a problem for the previous patch 
because 
the cyclic dependency didn't exist then.

I didn't include any stuff for libtesseract_full because it appears that this 
had 
already been removed. I'm not sure what the plan is here.

I didn't cater for leptonlib either since I don't know anything about it. I 
gather 
this will be the last release of tesseract that has it as an optional 
dependency so 
I'd rather tackle it when it's mandatory.

I doubt this stuff will work for VC++ but that's what the vcproj files are for, 
right? I have no experience with VC++ so I can't test it. I'm not particularly 
bothered about Windows in general but it would be nice for it to work in Cygwin 
so 
some help with the cyclic dependency issue would be appreciated.

I kept the EXTRA_DIST lines as they were since it makes no difference to the 
build. 
Do we really want those vcproj files in there though? Maybe if VC++ works, I 
guess.

Rather than create one big confusing patch, I thought it would be easier to 
send 
this up as a tarball with some instructions.

1) Delete the old autotools files.

# find \( -name "Makefile*" -o -name "configure*" \) -delete
# rm -r *.m4 config

2) For some reason, the callcpp.cpp file and callcpp.h files were in different 
libraries. libtool wasn't too happy about this. Surely it makes more sense to 
keep 
them together?

# mv ccstruct/callcpp.cpp cutil/callcpp.cpp

3) The standard name for the autotools config file is config.h so it's probably 
best 
to stick to this.

# sed -i "s/config_auto\.h/config.h/g" */*.cpp */*.h

4) Unpack the tarball.

# tar zxf tesseract-autotools.tar.gz

4) Do the autotools magic.

# libtoolize -c
# aclocal
# automake -a -c
# autoconf

I realise that the repository hasn't been touched since August so it looks as 
though 
development may have stalled. Still, I've put a lot of time into this and you'd 
be 
doing both the distribution packagers and the application developers a big 
favour if 
you could take a look. Thanks. :)

Original comment by JerseyChewi@gmail.com on 20 Nov 2009 at 9:45

GoogleCodeExporter commented 9 years ago
Missed some files by mistake.

Original comment by JerseyChewi@gmail.com on 29 Nov 2009 at 4:46

Attachments:

GoogleCodeExporter commented 9 years ago
I just couldn't get the latest SVN to work with Tesjeract and after much 
digging and 
banging my head against the wall, I threw in the towel and dropped back to 
2.04. 
I've given it the same treatment. The original patch used libtool convenience 
libraries and thus only the installed "full" and "training" libraries. I didn't 
feel 
that this was the way to go since all of the libraries were installed before. I 
couldn't get it to properly link "full" any other way though so I just dropped 
it. 
This still doesn't work in Cygwin, again because of the cyclic dependency. I 
guess 
using convenience libraries sidesteps the problem. A patch was needed to undo 
another cyclic dependency regarding CertaintyScale. That is included.

Original comment by JerseyChewi@gmail.com on 19 Dec 2009 at 5:37

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by joregan on 30 Sep 2010 at 12:47