Open bknowles opened 7 years ago
Okay, I managed to install all the pre-requisites, and get the program to do an "initialize" run by hovering the mouse over the various areas and pressing return, so that it could record the mouse pointer position. That information was saved.
However, when trying to do a normal run, I get the following error:
$ python BGNRRG.py
Traceback (most recent call last):
File "BGNRRG.py", line 250, in <module>
main()
File "BGNRRG.py", line 239, in main
setting_dict = read_setting()
File "BGNRRG.py", line 39, in read_setting
setting_dict[line[0]] = tuple(map(int, line[1:]))
ValueError: invalid literal for int() with base 10: '1351.92578125'
As you can see, the mouse position X and Y parameters are being saved as floats, not ints.
So, I changed the code to read:
def read_setting():
setting_dict = {}
with open("config.txt", "rU") as setting_file:
for line in setting_file:
line = line.rstrip("\n").split(" ")
setting_dict[line[0]] = tuple(map(float, line[1:]))
return(setting_dict)
And we get past that error. However, python itself then proceeds to crash, and BGNRRG.py appears to hang. See http://imgur.com/a/niWGU.
When I stop BGNRRG.py with a control-C, I get the following error:
^CTraceback (most recent call last):
File "BGNRRG.py", line 251, in <module>
main()
File "BGNRRG.py", line 247, in main
lang=args.language, verbose=args.verbose)
File "BGNRRG.py", line 135, in repeats
im = screen_grab(buttons.total_roll)
File "BGNRRG.py", line 112, in screen_grab
im = pyscreenshot.grab(box)
File "/Library/Python/2.7/site-packages/pyscreenshot/__init__.py", line 46, in grab
return _grab(to_file=False, childprocess=childprocess, backend=backend, bbox=bbox)
File "/Library/Python/2.7/site-packages/pyscreenshot/__init__.py", line 29, in _grab
return run_in_childprocess(_grab_simple, imcodec.codec, to_file, backend, bbox, filename)
File "/Library/Python/2.7/site-packages/pyscreenshot/procutil.py", line 29, in run_in_childprocess
e, r = queue.get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 117, in get
res = self._recv()
KeyboardInterrupt
I'm not sure what I can do to debug this issue further. ;(
Thanks for feedback. Good point with the float. The second thing seems to be some error in pyscreenshot library, however, it doesn't seem to be revealing.
After further googling, found possible issue, as the last line shows, something is wrong with multiprocessing:
I have pyscreenshot 0.4.2 installed, which supposedly includes the patch for ponty/pyscreenshot#38, so we should be avoiding these problems. Yes?
They say that childprocess=False
is set only when you are running from IDLE. However, the other links seems to suggest that this is general Mac problem, so could you please try it with childprocess=False
?
How would I make that modification? So far as I can tell, you're not using ImageGrab.grab, so I don't see how this option would be passed down?
change im = pyscreenshot.grab(box)
to im = pyscreenshot.grab(box, childprocess=False)
Well, that did get us past the IDLE problem. However, I now have a new one:
$ python BGNRRG.py
Traceback (most recent call last):
File "BGNRRG.py", line 251, in <module>
main()
File "BGNRRG.py", line 247, in main
lang=args.language, verbose=args.verbose)
File "BGNRRG.py", line 136, in repeats
value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '4 7 2\n93 0 913309 77 02020 0060212200134\n7700 044 79 735 7794 929 99 4 84 0 3 0 40 3 9 4 2\n3 888022 1 11 22 24 2 11\n79 70 7 1072 13 3 4 8\n777777777 7 77677777777777 2 8 1\n1 20 20370 025 3730 90 2\n92 1'
Well, I guess that is correct response from code, the value must be single number, right? It seems like tesseract is reading more numbers in the small screenshot that is taken. Just question, are you setting topright and topleft corner of the total-sum button correctly? I will try to add option to export these screenshots so you could see them.
I did properly hover over top-left and bottom-right of the total sum area (there's no button). I just re-tested that, and got the same type of error.
Interesting observation -- when I have a character visible that comes to a total of 77, I get the following error:
$ python BGNRRG.py
Traceback (most recent call last):
File "BGNRRG.py", line 251, in <module>
main()
File "BGNRRG.py", line 247, in main
lang=args.language, verbose=args.verbose)
File "BGNRRG.py", line 136, in repeats
value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '1 2 2\n77 0 9133292 77 02000 00823012200133\n7700 244 79 735 7794 909 99 4 84 0 3 0 40 3 9 4 2\n11 888022 1 11 22 24 2 11\n8997 70 7 1072 23 3 4 8\n7777 7 7 7777 7 66 7 7 7777777 2 8 1\n70 202 10370 295 377'
However, when I have a character loaded (from pressing the Recall button) that sums up to 93, I get the following:
$ python BGNRRG.py
Traceback (most recent call last):
File "BGNRRG.py", line 251, in <module>
main()
File "BGNRRG.py", line 247, in main
lang=args.language, verbose=args.verbose)
File "BGNRRG.py", line 136, in repeats
value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '1 2 2\n93 0 9133292 77 02000 00823012200133\n7700 244 79 735 7794 909 99 4 84 0 3 0 40 3 9 4 2\n11 888022 1 11 22 24 2 11\n8997 70 7 1072 23 3 4 8\n7777 7 7 7777 7 66 7 7 7777777 2 8 1\n70 202 10370 295 377'
See the difference? That first number of the second line (after the "\n") appears to be what is being recognized, there's just a lot of extra stuff being returned that I don't understand.
Hmm. The change in cd89b2b4a4aecce3b17da525c1343d6ebf150dee doesn't seem to include the code for childprocess=False
?
Shouldn't that go on line 114?
Okay, adding that to line 114 gets me past the same old IDLE crash that I had been having before, but I still have the problem with weird data being returned for check_data()
. Witness:
$ python BGNRRG.py
Traceback (most recent call last):
File "BGNRRG.py", line 273, in <module>
main()
File "BGNRRG.py", line 268, in main
verbose=args.verbose
File "BGNRRG.py", line 138, in repeats
value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '1 2 2\n79 0 9133292 77 02000 00823012200133\n7700 044 79 735 7794 909 99 4 94 0 3 0 40 3 9 4 2\n11 888022 1 11 22 24 2 11\n8997 70 7 1072 23 3 4 8\n7777 77777 7 697 77777777 2 8 1\n1 202 10370 295 3730 90 2'
Note that this was with a character on the BGEE screen which added up to a total of 79.
Also note that the tesseract
folder I mentioned above would only be found under /usr/local/Cellar
if the user had installed it using the HomeBrew package manager (i.e., using the command brew install tesseract
).
If they have installed tesseract
in some other fashion, maybe using a different package manager, then it would probably be in a somewhat different directory structure. The /usr/local/Cellar/
directory structure is unique to the HomeBrew Package Manager.
For more information on brew
, see https://brew.sh/.
Cool. Thanks!
That's weird. Witness:
$ python BGNRRG.py -i
File "BGNRRG.py", line 74
setting_dict["total_roll"] =
^
SyntaxError: invalid syntax
Note: Joining line 75 onto the end of line 74 fixes this problem.
However, now we have a new problem that I haven't seen before:
$ python BGNRRG.py
Traceback (most recent call last):
File "BGNRRG.py", line 278, in <module>
main()
File "BGNRRG.py", line 257, in main
setting_dict = read_setting()
File "BGNRRG.py", line 39, in read_setting
setting_dict[line[0]] = tuple(map(int, line[1:]))
ValueError: invalid literal for int() with base 10: 'right'
Sorry, pushed second fix and modified readme. (weird, github is automatically closing issues? Or did I clicked on "close and comment" instead of comment?)
Can you please initialize BGNRRG and then run python BGNRRG.py --training --no_value -n 10
? This will create folder training_data
and add 10 images with screenshots of specified area. (fixed the line continuation)
I'll give that a shot.
Sorry, still bombing on line 39:
$ python BGNRRG.py --training --no_value -n 10
Traceback (most recent call last):
File "BGNRRG.py", line 279, in <module>
main()
File "BGNRRG.py", line 258, in main
setting_dict = read_setting()
File "BGNRRG.py", line 39, in read_setting
setting_dict[line[0]] = tuple(map(int, line[1:]))
ValueError: invalid literal for int() with base 10: 'right'
That is because I am writing whole dictionary, not just buttons that I set, I should really stop making changes when I can't test it:/
Pushed another commit, should be working now.
Okay, managed to fix that code. Try this diff:
$ git diff
diff --git a/BGNRRG.py b/BGNRRG.py
index 706f091..5fcac4d 100644
--- a/BGNRRG.py
+++ b/BGNRRG.py
@@ -60,8 +60,8 @@ def create_setting():
add_to_dict(mouse, setting_dict, "reroll")
add_to_dict(mouse, setting_dict, "recall")
add_to_dict(mouse, setting_dict, "store")
- top_left = "top left corner"
- bottom_right = "bottom right corner"
+ top_left = "top_left_corner"
+ bottom_right = "bottom_right_corner"
add_to_dict(
mouse, setting_dict, top_left,
message="Hover over top left corner of total roll number:"
Your command $ python BGNRRG.py --training --no_value -n 10
did finally run!
But it captured some pretty strange PNGs. Let me see if I can post them all here:
FYI, here's my config.txt that was generated:
$ cat config.txt
recall 971 921
top_left_corner 1025 876
bottom_right_corner 1055 897
reroll 1083 924
total_roll 1025 876 1055 897
store 852 922
I'm assuming these are being stored in "label X Y" format, right? If so, then the different Y values for the three buttons versus the total_roll area do make sense -- that area is higher than the buttons. Comparing the X values of the total_roll against the X value of the reroll button, I'm not sure if that is right or not.
But certainly, those PNGs are whacked. The top left of the area is being tracked and captured correctly, but not the bottom right!
Bah, seems like another Mac-specific bug in pyscreenshot
library.
See here:
https://github.com/ponty/pyscreenshot/blob/master/pyscreenshot/plugins/mac_quartz.py
Would you be so kind to write simple test case using pyscreenshot.grab()
and report the issue there? (i.e., I assume that pyscreenshot.grab((1,1,2,2))
would basically take screenshot of your whole screen (assuming that topleft corner is (0,0)
and bottom right corner is (x_max, y_max)
), so just running python, grabbing screen and saving it into file)
Doing a pyscreenshot.grab((1,1,2,2))
only resulted in a PNG file that is 4x4 pixels in size:
EDIT: Since the PNG is white and doesn't show up on a white background, I'll include the filename here: https://cloud.githubusercontent.com/assets/1151895/24378559/fe6b8cb4-1308-11e7-88cf-1e6cef36e8ec.png
Okay, I hacked your main()
to try some stuff. I found that this works:
def main():
args = parse_args()
if args.initialize:
initialize()
else:
config_exists()
setting_dict = read_setting()
buttons = Buttons(setting_dict)
im = screen_grab((1025,876,28,21))
file_name = os.path.join("training_examples",
"newtest-pyscreenshot.png")
im.save(file_name, "PNG")
Note that the second pair of numbers being passed to screen_grab()
are the difference between the respective X and Y positions of the top_left and bottom_right. Here's the PNG that was captured:
The filename is https://cloud.githubusercontent.com/assets/1151895/24378808/f031bdb6-1309-11e7-80fd-3418ec54210b.png
EDIT: note that the PNG is actually twice the size requested -- I think that's because I'm using a Retina screen which has an actual resolution of something like 150dpi, instead of the normal 72dpi.
But macOS compensates for that and handles the translation both ways. It grabs what we actually want, we just have to deal with the fact that the file is larger than we would normally expect.
So in Mac it is additive, not absolute. I will report this to pyscreenshot
Okay, I made a minor change to temporarily handle the relative size issue, and now I've got it capturing a number of reasonable images. Here's the diff:
$ git diff
diff --git a/BGNRRG.py b/BGNRRG.py
index c110918..64a4338 100644
--- a/BGNRRG.py
+++ b/BGNRRG.py
@@ -1,5 +1,6 @@
import pyscreenshot
from pymouse import PyMouse
+from operator import sub
import pytesseract
import os
import argparse as arg
@@ -60,8 +61,8 @@ def create_setting():
add_to_dict(mouse, setting_dict, "reroll")
add_to_dict(mouse, setting_dict, "recall")
add_to_dict(mouse, setting_dict, "store")
- top_left = "top left corner"
- bottom_right = "bottom right corner"
+ top_left = "top_left_corner"
+ bottom_right = "bottom_right_corner"
add_to_dict(
mouse, setting_dict, top_left,
message="Hover over top left corner of total roll number:"
@@ -70,6 +71,8 @@ def create_setting():
mouse, setting_dict, bottom_right,
message="Hover over bottom right corner of total roll number:"
)
+
+ setting_dict[bottom_right] = map(sub, setting_dict[bottom_right], setting_dict[top_left])
setting_dict["total_roll"] = \
setting_dict[top_left] + setting_dict[bottom_right]
Here's the config.txt:
$ cat config.txt
recall 1091 927
reroll 1209 923
total_roll 1149 877 35 22
store 971 927
And here are the files captured by the command python BGNRRG.py --training --no_value -n 10
:
I'm guessing now that I need to run the retraining process?
EDIT: Confirmed. I did a training run with 100 grabs, and it is mis-recognizing 79
as 737
. I'm not sure about the other numbers.
EDIT2: More accurately, it is mis-recognizing 9
as 37
. It's also mis-recognizing 5
as 3
.
So, how do I feed this information back to tesseract
in order to get corrected OCR information?
When I do a run in non-training mode, it doesn't seem to actually be clicking on the buttons, although it is recognizing some sort of value being returned by tesseract
. So, when I do a run with a current total of 79
showing, it gives me 100 loops of claiming to see 737
, and then quits.
Please, read the help in "training", I should probably change the name, its not really "training", but it shows what kind of values tesseract it is reading. I will try to write something about it on weekend. In the meantime, would you please be so kind and help us debug underlying libraries? As devs not I have access to mac. See this open issue: https://github.com/ponty/pyscreenshot/issues/40
I could easily write fix, but I don't have access to mac to test it:/
Thanks!
Okay, that's doubly weird. I had un-commented the lines to save the maximum value PNGs, and that appears to have been what was causing the clicks on the reroll button to fail. Comment those lines back out, and now it can actually click on the reroll button.
It's still mis-recognizing the 9s and 5s, but we're really, really close!
Because it is OCR, it there will be still some failure rate. We would have to make another training dataset and create bgee3.traineddata with bigger weight on 9 and 5. And again and again, until we have low failure rate. I need to think about automating this process.
Okay, so reading the documentation at https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality, it seems that there are some things we can do to the input images to help tesseract
achieve better quality OCR.
I also found some useful examples at https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy.
In this specific case, I believe that the most useful things for us to do would be to convert the image from color to grayscale, and then to maximize the contrast and remove the background noise in the image. Doing some searching around, it seems that OpenCV could be a good choice for these functions.
Specifically, the example for converting to grayscale is at http://docs.opencv.org/master/df/d9d/tutorial_py_colorspaces.html, and the image thresholding example is at http://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html.
The thresholding example with cv2.THRESH_BINARY_INV
would appear to do two steps at once, and give us the cleanest black text on a white background for input to tesseract
.
However, what I have not yet worked out is how to convert the image type returned by screen_grab(buttons.total_roll)
into a type INPUT_ARRAY
, which is what the OpenCV routines seem to want. Obviously, once you've done the conversions shown above, you would then have to translate that back before feeding it to tesseract
.
Alas... I was hoping this would end in solution. Ah, well.
I'm trying out your program on my Mac, and I've discovered that
tesseract
can be installed using thebrew
package manager.My "tessdata" folder is in
/usr/local/Cellar/tesseract/3.05.00/share/tessdata/
.