gitanat / simple-ocr-opencv

A simple python OCR engine using opencv
GNU Affero General Public License v3.0
525 stars 175 forks source link

PEP-8 formatting, unused imports #16

Closed RedFantom closed 7 years ago

RedFantom commented 7 years ago

This single commit contains all the changes I made to make the code in the master branch of my own fork PEP-8 compliant. However, this code also contains a merge of the opencv3 branch. The changes are small and subtle, but they break compatibility with OpenCV 2 nonetheless. It must be noted that the commands pip install opencv-python and sudo apt-get install opencv-python now both provide OpenCV version 3.2.0, which means OpenCV 2 is deprecated. Should the OpenCV 3 changes be separated from the style changes, for as far as that is even possible? Or should the opencv branch on @goncalopp original repository be merged into master first and then this one?

This PR is related to issue #14 . Squash merging this branch with a new commit message is recommended.

gitanat commented 7 years ago

Yeah, sorry, opencv version support will have to remain on a separate branch, as CV2 is actually the default both in Debian Stable and Ubuntu LTS.

sudo apt-get install opencv-python doesn't imply much without knowing your distribution version - I'm guessing a recent version of Ubuntu?

Ideally we need a commit with only the formatting changes - you can get these from just running the code formatter over the original files. If you don't mind using autopep8 (or any standalone formatter, really) I'm happy to do it as well

Introducing formatting changes to a project is always a pain, unfortunately, since it makes your version unmergeable with everyone else's. But we're actually lucky no one else is working on the code at the moment :)

RedFantom commented 7 years ago

Unfortunately supporting OpenCV 2 would also make the code not work on Travis-CI or AppVeyor, and it would be counter-intuitive in my opinion for Windows users, especially if we want to keep this OCR simple. I'm running Ubuntu 16.04 LTS on one of the computers I work on.

Rerunning the code formatter won't apply all the changes I made. I ran all the docstrings by hand to fully follow the PEP-8 standards. I used PyCharm's built-in formatting utility and the PEP-8 formatting checks.

gitanat commented 7 years ago

Unfortunately supporting OpenCV 2 would also make the code not work on Travis-CI or AppVeyor

Taking a look at your travis setup, it looks like the source you're building opencv from maintains the 2.4 branch, so it should be straightforward to support, no?

, and it would be counter-intuitive in my opinion for Windows users, especially if we want to keep this OCR simple

I don't really understand the build process for windows/AppVeyor at this point, sorry. What's the source for opencv installation in windows? I don't see any pull/build step in AppVeyor - is all of it done automagically by pip?

I'm running Ubuntu 16.04 LTS on one of the computers I work on

Oops, my bad, I checked the "latest" version from the image at https://wiki.ubuntu.com/LTS,which says 14.04, but rechecking it it looks like they haven't updated it in a while

Anyway, I definitely need the code to at least remain working on current Debian stable, or it will be impossible for me to test your changes. And we shouldn't mix formatting with a merge - if anyone has clones of this repository (which seems likely), it would effectively prevent them from choosing which patches to apply

RedFantom commented 7 years ago

Supporting the code on Windows is very important to me. If you need OCR on Linux, you have alternatives such as Tesseract, but on Windows they are very difficult to install correctly for end-users. AppVeyor automagically pulls your latest commit and builds from that, yes. OpenCV is installed using the pip install opencv-python command, which installs version 3.2.0. If you want to take a look at the AppVeyor logs, you can find the project here. Please note that the history is full of builds of the readme, tests and other branches for a few pages. Travis-CI is building the OpenCV installation from source because sudo apt-get install opencv-python installs OpenCV for the whole system and not for the virtualenv used by the Travis-CI builder. The version built is OpenCV 3, as I understand it. If it's not, it's at least compatible with OpenCV 3 somehow, as the master branch with the opencv3 branch merged into it works fine on Travis-CI building. I could not use the commit that I originally forked the repository of off, because it simply couldn't run on my systems without OpenCV 3 support.

The conclusion, as far as I understand here, is that the changes I made to the formatting should be completely recreated from scratch and applied to the commit that I originally cloned and had not yet applied the merge of opencv3 to.

gitanat commented 7 years ago

Thanks! Yes, I do remember the builds for OpenCV2 being a pain a few years ago, even on linux. I think it's likely that we can find a way to have CI only for OpenCV 3 in windows, but both 2 and 3 on Linux, so it should work out in the end :) Let's merge the formatting changes on this one, keep the existing branhes, and discuss cv2/3 once we get to CI

RedFantom commented 7 years ago

OK, so only one of the small changes made for OpenCV 3 support was inherited by this branch. I have now reverted that change, so only the formatting changes are included. It's ready for squash-merging now, right?

gitanat commented 7 years ago

I'm still seeing changes related to opencv?

-        self.knn= cv2.KNearest()
+        self.knn = cv2.ml.KNearest_create()

-        self.knn.train( features, classes )
+        self.knn.train(features, cv2.ml.ROW_SAMPLE, classes)

Didn't check further. I think it would be safest just run the code formatter from scratch at this point? The parameters are empty, so I would prefer to exclude them until they're filled, and I don't thing the double quotes vs single quotes justify all the trouble. I can add them manually afterwards with a replace command, if you care a lot about it

RedFantom commented 7 years ago

You're right. The revert I ran didn't revert it all, apparently. I went over 7bcd3f1 manually and updated the code accordingly. If the code succeeds in the unittests on an OpenCV 2 environment, I think we could just merge them in. If you look over the changes and review them, there aren't any further changes except for spacing and the docstrings and the two unused imports I removed. I reviewed all the files line by line, and there aren't any other changes.

gitanat commented 7 years ago

Nice, thanks!

I think all that's left is to fix the docstrings. If we're following PEP257, note one-line docstrings should have the opening and closing quotes on the same line https://www.python.org/dev/peps/pep-0257/#one-line-docstrings

Also, please removed any undocumented parameters and returns. You might want to do this in a separate commit, so you can easiliy revert it when you decide to write them

gitanat commented 7 years ago

Nice, it's looking good, thanks! There were just a few minor things that needed fixing - please check https://github.com/goncalopp/simple-ocr-opencv/commit/8702d00ada05a2b5363c89f5bdcf5880d8653644 And cherry-pick it here if you agree with the changes

RedFantom commented 7 years ago

Cherry-picked and done, thanks for fixing that! Sorry I had missed those... >.< Just a heads-up: I won't have much time the upcoming two weeks, so I don't know how long it will take to get the other branches ready for merging. I'll do my best though!

gitanat commented 7 years ago

Merged - thank you for this, and for making all the changes! No rush on the PRs. You'll probably have to rebase them on top of this commit in any case