gali8 / Tesseract-OCR-iOS

Tesseract OCR iOS is a Framework for iOS7+, compiled also for armv7s and arm64.
http://www.nexor.it
MIT License
4.22k stars 949 forks source link

PDF output support #186

Closed ws233 closed 9 years ago

ws233 commented 9 years ago

The following things should be mentioned.

  1. Leptonica has been built with libz, libpng, libjpeg and liftiff.
  2. Building script has been changed for that.
  3. Podfile.lock has been updated to support CocoaPods 0.36.4
ws233 commented 9 years ago

@kevincon, I've fixed all those errors I had in my previous PR. Furthermore libtiff repo has been updated, so I've rebuilded it and include new binaries in a new PR. Pls, review.

kevincon commented 9 years ago

It's looking pretty good! Can you merge steps 2 and 3 from "README_howto_compile_libaries.md"? In other words, can you make the libtiff submodule init, update, and building part of build_dependencies.sh so the user only has to run one script? I'm not opposed to keeping your build-png.sh, build-jpg.sh, and build-tiff.sh scripts, but they should be called from build_dependencies.sh so the user only has to run one script.

ws233 commented 9 years ago

It's looking pretty good! Can you merge steps 2 and 3 from "README_howto_compile_libaries.md"? In other words, can you make the libtiff submodule init, update, and building part of build_dependencies.sh so the user only has to run one script? I'm not opposed to keeping your build-png.sh, build-jpg.sh, and build-tiff.sh scripts, but they should be called from build_dependencies.sh so the user only has to run one script.

I thought about it. But the script becomes to heavy. It contains to many build phases. So it becomes difficult to find a real error in the output of such heavy script. In such case we should make a much intelligent script, which will stop the further steps, if one of the previous steps fail. I'm not so good at bash scripts, so it will take a lot of time for me to make such clever script. So far, I'll just commit the build_dependencies.sh script with only those conserns from you.

ws233 commented 9 years ago

By the way, we have our lib mentioned on the upstream tesseract page now: https://github.com/tesseract-ocr/tesseract/wiki/3rdParty :)

kevincon commented 9 years ago

I just tried running the script and it seems to work great, nice job!

I have one last question for you before we merge: do we need to keep the libtiff-ios related .a files under version control? In other words, are they necessary for the iOS framework to be built? This is what I see if I run git status after running ./build_dependencies.sh:

➜  TesseractOCR git:(pr/186) ✗ git status
On branch pr/186
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)
  (commit or discard the untracked or modified content in submodules)

    modified:   lib/libjpeg.a
    modified:   lib/liblept.a
    modified:   lib/libpng.a
    modified:   lib/libpng16.a
    modified:   lib/libtesseract_all.a
    modified:   lib/libtiff.a
    modified:   lib/libtiffxx.a
    modified:   libtiff-ios (modified content, untracked content)
    modified:   tesseract-ocr (untracked content)

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    ../.DS_Store
    .DS_Store
    lib/libtesseract.a
    lib/libtesseract_api.a
    lib/libtesseract_ccstruct.a
    lib/libtesseract_ccutil.a
    lib/libtesseract_classify.a
    lib/libtesseract_cube.a
    lib/libtesseract_cutil.a
    lib/libtesseract_dict.a
    lib/libtesseract_main.a
    lib/libtesseract_neural.a
    lib/libtesseract_opencl.a
    lib/libtesseract_textord.a
    lib/libtesseract_viewer.a
    lib/libtesseract_wordrec.a

Can you:

1) Remove any of the files in the "Changes not staged for commit" section that we don't need under version control for the library to work properly 2) After (1), re-run git status and add the files under the "Untracked files" section to the .gitignore file in the root folder of the repo 3) Re-run ./build_dependencies.sh and if any of the libs that are currently under version control have changed, include their updates in a new commit (I'm mentioning this one because the libs we do care about like libtesseract_all.a and liblept.a showed that they had changed after I ran ./build_dependencies.sh myself)

ws233 commented 9 years ago

I will, but I think those libraries are being updated after every run of build_dependecies.sh. I don't know the reason. I have not compared those .s files.

ws233 commented 9 years ago

do we need to keep the libtiff-ios related .a files under version control? In other words, are they necessary for the iOS framework to be built?

Yes, we do. Yes, they are. They are the leptonica dependecies. Without them there are liner errors. You may try it yourself. Just remove those files, clean iOS framework and try to build it again.

kevincon commented 9 years ago

Yes, we do. Yes, they are. They are the leptonica dependecies. Without them there are liner errors. You may try it yourself. Just remove those files, clean iOS framework and try to build it again.

I thought that liblept.a represented leptonica and all of its dependencies, including all of those libtiff-ios libraries, sort of like how libtesseract_all.a represents all of the tesseract .a library files. But if that's not true then I agree that we need to keep them under version control.

So I think all that's left is to:

1) Add these files to the .gitignore file:

.DS_Store
libtesseract.a
libtesseract_api.a
libtesseract_ccstruct.a
libtesseract_ccutil.a
libtesseract_classify.a
libtesseract_cube.a
libtesseract_cutil.a
libtesseract_dict.a
libtesseract_main.a
libtesseract_neural.a
libtesseract_opencl.a
libtesseract_textord.a
libtesseract_viewer.a
libtesseract_wordrec.a

2) Rebase this pull request and squash all of its commits into one commit. We should get in the habit of doing this so we don't have commits like "Typo fixing" polluting the repo's commit history anymore. See this guide for more info if you're not sure how to do this: https://github.com/edx/edx-platform/wiki/How-to-Rebase-a-Pull-Request

ws233 commented 9 years ago

I agree. I will.

ws233 commented 9 years ago

We should get in the habit of doing this so we don't have commits like "Typo fixing" polluting the repo's commit history anymore.

Do you mean we should squash the commits after every changes we do according the others complains or only at the end before final merge? I think it's more difficult to watch such small current changes made during the developement.

kevincon commented 9 years ago

I meant squashing the commits just at the end before the final merge. I agree that it's valuable to see the evolution of the commits during the review of a PR.

Thanks again for this work! Merging and closing.