Closed RedFantom closed 7 years ago
It appears Travis-CI has problems with the Python 3 environments. Should we remove those from the file? I can't find any issue reports about it in the travis-ci repo.
About Travis: found this issue https://github.com/travis-ci/travis-ci/issues/4260
Can you try with python 3.2 only? It's not ideal, but definitely better than not having CI for python3
OK, so the Travis-CI Linux distribution doesn't use the manylinux
python-opencv
wheel, unfortunately. I'll try compiling the stuff again.
OK, the branch is quite a bit of mess. I will clean up and do a push --force
. Just so you know!
So, the tests now fail on both my own machine, and my Kubuntu VM, and Travis-CI, all with this error:
File "/home/travis/build/RedFantom/simple-ocr-opencv/tests/test_ocr.py", line 32, in test_ocr_unicode
self._test_ocr(ImageFile('unicode1'), ImageFile('unicode1'))
File "/home/travis/build/RedFantom/simple-ocr-opencv/files.py", line 61, in __init__
self.ground.read()
File "/home/travis/build/RedFantom/simple-ocr-opencv/files.py", line 38, in read
self.classes, self.segments = read_boxfile(self.path)
File "/home/travis/build/RedFantom/simple-ocr-opencv/tesseract_utils.py", line 15, in read_boxfile
return classes_to_numpy(classes), segments_to_numpy(segments)
File "/home/travis/build/RedFantom/simple-ocr-opencv/classification.py", line 20, in classes_to_numpy
int_classes = array.array("I", "".join(classes).encode('utf-32')[4:])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)
I will try to find the cause, but it probably has something to do with the recent changes to master
, because they weren't there before.
Update: Oh, of course, it's actually quite obvious: the new tests with the unicode characters are throwing the errors.
OK, so the changes from chr
to unichr
did not affect the raised errors, unfortunately.
Also, I have a question, @goncalopp: In Python 3, all strings are unicode. Doesn't chr
do exactly the same as the old unichr
and isn't str
the same call as unicode
was in Python 2?
Oh, of course, it's actually quite obvious: the new tests with the unicode characters are throwing the errors.
Yes - I added the tests because I knew your code would brake them ;) I think I annotated all the places that need to be changed in the review
Also, I have a question, @goncalopp: In Python 3, all strings are unicode. Doesn't chr do exactly the same as the old unichr and isn't str the same call as unicode was in Python 2?
Yes, yes and yes.
It's just that the code still needs to work in python2 :) At least if you want to merge it into master
In python2, chr
and str
need to be in the ascii range. You can use six
to have python2/3 cross-compatibility
@goncalopp Ah, of course.
Passing an encoding
argument fixed the UnicodeDecodeError
, but now there another error thrown with the unicode test on Windows:
File "C:\Users\RedFantom\Source Code\simple-ocr-opencv\tests\test_ocr.py", line 32, in test_ocr_unicode
self._test_ocr(ImageFile('unicode1'), ImageFile('unicode1'))
File "C:\Users\RedFantom\Source Code\simple-ocr-opencv\tests\test_ocr.py", line 25, in _test_ocr
self.assertEqual(chars, reconstruct_chars(ground_truth))
AssertionError: '\u16a0\u16c7ðþ\u03b7\u03b3\u03bb\u03c3\u03b1\u043b\u043b\u044c\u0433\u044f\u10d5\u10d4\u10de\u10d8\u10e1\u0baf
çæ\u0250\u025c\u053f\u0580\u0576\u0561\u0574\u0631\u0649\u0d91\u0dba\u03b3\u0db1\u0e94\u0e8d\u6211\u4e0b\u800c\u8eab\u1546\u152
d\u0259\u0153\u20ac' != '\u16a0\u16c7ðþ\u03b7\u03b3\u03bb\u03c3\u03b1\u0434\u043b\u044c\u0433\u044f\u10d5\u10d4\u10de\u10d8\u10
e1\u0bafçæ\u0250\u025c\u053f\u0580\u0576\u0561\u0574\u0631\u0649\u0d91\u0dba\u0db1\u0e94\u0e8d\u6211\u4e0b\u800c\u8eab\u1546\u1
52d\u0259\u0153\u20ac'
- \u16a0\u16c7ðþ\u03b7\u03b3\u03bb\u03c3\u03b1\u043b\u043b\u044c\u0433\u044f\u10d5\u10d4\u10de\u10d8\u10e1\u0bafçæ\u0250\u025c\
u053f\u0580\u0576\u0561\u0574\u0631\u0649\u0d91\u0dba\u03b3\u0db1\u0e94\u0e8d\u6211\u4e0b\u800c\u8eab\u1546\u152d\u0259\u0153\u
20ac
The same code on py3+cv3 on windows and linux has different results? That's a bummer... Here's the text I got from your message:
ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€
ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€
It looks like it's misclassifying л
for д
, and adding a y
? I can kind of understand the first, but the second is a bit weird
Try running example.py
with unicode1
as both the train and test files - it might give you some insight.
Tweaking parameters there might be helpful.
Perhaps it has to do with this part of the box file?
ն 71 77 19 25 0
ա 124 78 19 23 0
մ 8 83 16 24 0
ر 44 83 25 19 0
ى 89 87 20 18 0
එ 107 91 16 17 0
ය 143 83 20 18 0
It swapped the characters and 0's for two characters, it seems. At least, this is what it looks like in PyCharm. In Notepad++, it is completely messed up, and in Atom too it shows like in PyCharm. In Atom I can't edit the lines with those characters on it, and with PyCharm I can only delete them. Would this be a problem with the fact that I have a Dutch Windows installation that doesn't support these characters at all or something?
It swapped the characters and 0's for two characters, it seems. At least, this is what it looks like in PyCharm.
As far as I can tell (with a hex editor and python), the box file is correct. The characters appear swapped because some editors (including github's) detect those two letters (correctly) as being arabic, and reverse the (visualization of the) entire line from right to left.
Here's how those 2 lines look in python:
u'\u0631 44 83 25 19 0',
u'\u0649 89 87 20 18 0',
So I don't think this is related (unless your python is giving you different results)
@goncalopp So, the problem is occurring on AppVeyor too. It is probably a decode error, not a recognition error. The error does not occur when running the master
branch tests on my machine, so it does have something to do with the six.text_type
against the Python 2 str
or six.chr
compared to unichr
in Python 2.
Update: Using past.types.str
and past.builtins.unichr
also don't make the code work on Windows.
Update: Or perhaps the errors are completely unrelated to the branch changes after all. In that case, in my opinion, this branch should be merged regardlessly if you're happy with the result, and we should take a look at the errors in another branch.
@RedFantom
Or perhaps the errors are completely unrelated to the branch changes after all. In that case, in my opinion, this branch should be merged regardlessly if you're happy with the result, and we should take a look at the errors in another branch.
Yes, since this passes on py2/3 cv2/3 on linux, I'm happy merging it in :) I left you a couple of comments on minor things
@goncalopp As I am away from my computer at the moment ( :( ) , I can't implement the changes myself at the moment, but I do agree with all of them. If you want, you can make them, or I will later this week.
I am sorry there have to be so many changes, I'm learning a lot about developing a library from you!
@goncalopp The branch fails to complete the tests because of the from six.moves import input
import. Without it, it does just fine.
This is looking good, thank you!
Actually, I'm sorry for being so picky! But I'm glad that you're learning things in the process :) The code is improving a lot thanks to you
The reason input
mocking isn't working is that you're importing it directly into the module (grounding
). mock.path
works by overriding attribute access, so it only works when the attribute (moves.input
) happens after the patching.
I've added small changes on 0608d72b67b215ceec95c21085e462b57da61228. Can you please take a look and, if you agree, cherry-pick it?
@goncalopp That sounds logical! I have cherry-picked your commit, it looks good and it doesn't make any changes to the test results. Thank you for checking that out!
Being picky is a good thing when it comes to libraries: the code must be readable and as consistent as possible to allow other users to modify the code without having to work through a lot of pad practices!
Using the
2to3
tool, manual changes and the fix for theint
division that @goncalopp found and mentioned in #14, the changes in this branch make the code compatible with Python 3.The tests pass both on Linux (Kubuntu 17.04 64-bit) and Windows (Windows 8.1 64-bit) with both Python 2.7 and Python 3.5 for me.