J-Moravec / BGNRRG

Baldur's Gate: Enhanced Edition Non Random Roll Generator
5 stars 2 forks source link

Usage on a Mac... #2

Open bknowles opened 7 years ago

bknowles commented 7 years ago

I'm trying out your program on my Mac, and I've discovered that tesseract can be installed using the brew package manager.

My "tessdata" folder is in /usr/local/Cellar/tesseract/3.05.00/share/tessdata/.

bknowles commented 7 years ago

Okay, I managed to install all the pre-requisites, and get the program to do an "initialize" run by hovering the mouse over the various areas and pressing return, so that it could record the mouse pointer position. That information was saved.

However, when trying to do a normal run, I get the following error:

$ python BGNRRG.py
Traceback (most recent call last):
  File "BGNRRG.py", line 250, in <module>
    main()
  File "BGNRRG.py", line 239, in main
    setting_dict = read_setting()
  File "BGNRRG.py", line 39, in read_setting
    setting_dict[line[0]] = tuple(map(int, line[1:]))
ValueError: invalid literal for int() with base 10: '1351.92578125'

As you can see, the mouse position X and Y parameters are being saved as floats, not ints.

So, I changed the code to read:

def read_setting():
    setting_dict = {}
    with open("config.txt", "rU") as setting_file:
        for line in setting_file:
            line = line.rstrip("\n").split(" ")
            setting_dict[line[0]] = tuple(map(float, line[1:]))
    return(setting_dict)

And we get past that error. However, python itself then proceeds to crash, and BGNRRG.py appears to hang. See http://imgur.com/a/niWGU.

When I stop BGNRRG.py with a control-C, I get the following error:

^CTraceback (most recent call last):
  File "BGNRRG.py", line 251, in <module>
    main()
  File "BGNRRG.py", line 247, in main
    lang=args.language, verbose=args.verbose)
  File "BGNRRG.py", line 135, in repeats
    im = screen_grab(buttons.total_roll)
  File "BGNRRG.py", line 112, in screen_grab
    im = pyscreenshot.grab(box)
  File "/Library/Python/2.7/site-packages/pyscreenshot/__init__.py", line 46, in grab
    return _grab(to_file=False, childprocess=childprocess, backend=backend, bbox=bbox)
  File "/Library/Python/2.7/site-packages/pyscreenshot/__init__.py", line 29, in _grab
    return run_in_childprocess(_grab_simple, imcodec.codec, to_file, backend, bbox, filename)
  File "/Library/Python/2.7/site-packages/pyscreenshot/procutil.py", line 29, in run_in_childprocess
    e, r = queue.get()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 117, in get
    res = self._recv()
KeyboardInterrupt

I'm not sure what I can do to debug this issue further. ;(

J-Moravec commented 7 years ago

Thanks for feedback. Good point with the float. The second thing seems to be some error in pyscreenshot library, however, it doesn't seem to be revealing.

After further googling, found possible issue, as the last line shows, something is wrong with multiprocessing:

https://github.com/ponty/pyscreenshot/issues/38

https://github.com/ponty/pyscreenshot/issues/35

bknowles commented 7 years ago

I have pyscreenshot 0.4.2 installed, which supposedly includes the patch for ponty/pyscreenshot#38, so we should be avoiding these problems. Yes?

J-Moravec commented 7 years ago

They say that childprocess=False is set only when you are running from IDLE. However, the other links seems to suggest that this is general Mac problem, so could you please try it with childprocess=False?

bknowles commented 7 years ago

How would I make that modification? So far as I can tell, you're not using ImageGrab.grab, so I don't see how this option would be passed down?

J-Moravec commented 7 years ago

change im = pyscreenshot.grab(box) to im = pyscreenshot.grab(box, childprocess=False)

bknowles commented 7 years ago

Well, that did get us past the IDLE problem. However, I now have a new one:

$ python BGNRRG.py
Traceback (most recent call last):
  File "BGNRRG.py", line 251, in <module>
    main()
  File "BGNRRG.py", line 247, in main
    lang=args.language, verbose=args.verbose)
  File "BGNRRG.py", line 136, in repeats
    value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '4 7 2\n93 0 913309 77 02020 0060212200134\n7700 044 79 735 7794 929 99 4 84 0 3 0 40 3 9 4 2\n3 888022 1 11 22 24 2 11\n79 70 7 1072 13 3 4 8\n777777777 7 77677777777777 2 8 1\n1 20 20370 025 3730 90 2\n92 1'
J-Moravec commented 7 years ago

Well, I guess that is correct response from code, the value must be single number, right? It seems like tesseract is reading more numbers in the small screenshot that is taken. Just question, are you setting topright and topleft corner of the total-sum button correctly? I will try to add option to export these screenshots so you could see them.

bknowles commented 7 years ago

I did properly hover over top-left and bottom-right of the total sum area (there's no button). I just re-tested that, and got the same type of error.

bknowles commented 7 years ago

Interesting observation -- when I have a character visible that comes to a total of 77, I get the following error:

$ python BGNRRG.py
Traceback (most recent call last):
  File "BGNRRG.py", line 251, in <module>
    main()
  File "BGNRRG.py", line 247, in main
    lang=args.language, verbose=args.verbose)
  File "BGNRRG.py", line 136, in repeats
    value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '1 2 2\n77 0 9133292 77 02000 00823012200133\n7700 244 79 735 7794 909 99 4 84 0 3 0 40 3 9 4 2\n11 888022 1 11 22 24 2 11\n8997 70 7 1072 23 3 4 8\n7777 7 7 7777 7 66 7 7 7777777 2 8 1\n70 202 10370 295 377'

However, when I have a character loaded (from pressing the Recall button) that sums up to 93, I get the following:

$ python BGNRRG.py
Traceback (most recent call last):
  File "BGNRRG.py", line 251, in <module>
    main()
  File "BGNRRG.py", line 247, in main
    lang=args.language, verbose=args.verbose)
  File "BGNRRG.py", line 136, in repeats
    value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '1 2 2\n93 0 9133292 77 02000 00823012200133\n7700 244 79 735 7794 909 99 4 84 0 3 0 40 3 9 4 2\n11 888022 1 11 22 24 2 11\n8997 70 7 1072 23 3 4 8\n7777 7 7 7777 7 66 7 7 7777777 2 8 1\n70 202 10370 295 377'

See the difference? That first number of the second line (after the "\n") appears to be what is being recognized, there's just a lot of extra stuff being returned that I don't understand.

bknowles commented 7 years ago

Hmm. The change in cd89b2b4a4aecce3b17da525c1343d6ebf150dee doesn't seem to include the code for childprocess=False?

Shouldn't that go on line 114?

bknowles commented 7 years ago

Okay, adding that to line 114 gets me past the same old IDLE crash that I had been having before, but I still have the problem with weird data being returned for check_data(). Witness:

$ python BGNRRG.py
Traceback (most recent call last):
  File "BGNRRG.py", line 273, in <module>
    main()
  File "BGNRRG.py", line 268, in main
    verbose=args.verbose
  File "BGNRRG.py", line 138, in repeats
    value = int(check_image(im, lang))
ValueError: invalid literal for int() with base 10: '1 2 2\n79 0 9133292 77 02000 00823012200133\n7700 044 79 735 7794 909 99 4 94 0 3 0 40 3 9 4 2\n11 888022 1 11 22 24 2 11\n8997 70 7 1072 23 3 4 8\n7777 77777 7 697 77777777 2 8 1\n1 202 10370 295 3730 90 2'

Note that this was with a character on the BGEE screen which added up to a total of 79.

bknowles commented 7 years ago

Also note that the tesseract folder I mentioned above would only be found under /usr/local/Cellar if the user had installed it using the HomeBrew package manager (i.e., using the command brew install tesseract).

If they have installed tesseract in some other fashion, maybe using a different package manager, then it would probably be in a somewhat different directory structure. The /usr/local/Cellar/ directory structure is unique to the HomeBrew Package Manager.

For more information on brew, see https://brew.sh/.

bknowles commented 7 years ago

Cool. Thanks!

bknowles commented 7 years ago

That's weird. Witness:

$ python BGNRRG.py -i
  File "BGNRRG.py", line 74
    setting_dict["total_roll"] =
                               ^
SyntaxError: invalid syntax

Note: Joining line 75 onto the end of line 74 fixes this problem.

bknowles commented 7 years ago

However, now we have a new problem that I haven't seen before:

$ python BGNRRG.py
Traceback (most recent call last):
  File "BGNRRG.py", line 278, in <module>
    main()
  File "BGNRRG.py", line 257, in main
    setting_dict = read_setting()
  File "BGNRRG.py", line 39, in read_setting
    setting_dict[line[0]] = tuple(map(int, line[1:]))
ValueError: invalid literal for int() with base 10: 'right'
J-Moravec commented 7 years ago

Sorry, pushed second fix and modified readme. (weird, github is automatically closing issues? Or did I clicked on "close and comment" instead of comment?)

Can you please initialize BGNRRG and then run python BGNRRG.py --training --no_value -n 10? This will create folder training_data and add 10 images with screenshots of specified area. (fixed the line continuation)

bknowles commented 7 years ago

I'll give that a shot.

bknowles commented 7 years ago

Sorry, still bombing on line 39:

$ python BGNRRG.py --training --no_value -n 10
Traceback (most recent call last):
  File "BGNRRG.py", line 279, in <module>
    main()
  File "BGNRRG.py", line 258, in main
    setting_dict = read_setting()
  File "BGNRRG.py", line 39, in read_setting
    setting_dict[line[0]] = tuple(map(int, line[1:]))
ValueError: invalid literal for int() with base 10: 'right'
J-Moravec commented 7 years ago

That is because I am writing whole dictionary, not just buttons that I set, I should really stop making changes when I can't test it:/

Pushed another commit, should be working now.

bknowles commented 7 years ago

Okay, managed to fix that code. Try this diff:

$ git diff
diff --git a/BGNRRG.py b/BGNRRG.py
index 706f091..5fcac4d 100644
--- a/BGNRRG.py
+++ b/BGNRRG.py
@@ -60,8 +60,8 @@ def create_setting():
     add_to_dict(mouse, setting_dict, "reroll")
     add_to_dict(mouse, setting_dict, "recall")
     add_to_dict(mouse, setting_dict, "store")
-    top_left = "top left corner"
-    bottom_right = "bottom right corner"
+    top_left = "top_left_corner"
+    bottom_right = "bottom_right_corner"
     add_to_dict(
         mouse, setting_dict, top_left,
         message="Hover over top left corner of total roll number:"
bknowles commented 7 years ago

Your command $ python BGNRRG.py --training --no_value -n 10 did finally run!

But it captured some pretty strange PNGs. Let me see if I can post them all here:

0 1 2 3 4 5 6 7 8 9

bknowles commented 7 years ago

FYI, here's my config.txt that was generated:

$ cat config.txt 
recall 971 921
top_left_corner 1025 876
bottom_right_corner 1055 897
reroll 1083 924
total_roll 1025 876 1055 897
store 852 922

I'm assuming these are being stored in "label X Y" format, right? If so, then the different Y values for the three buttons versus the total_roll area do make sense -- that area is higher than the buttons. Comparing the X values of the total_roll against the X value of the reroll button, I'm not sure if that is right or not.

But certainly, those PNGs are whacked. The top left of the area is being tracked and captured correctly, but not the bottom right!

J-Moravec commented 7 years ago

Bah, seems like another Mac-specific bug in pyscreenshot library. See here: https://github.com/ponty/pyscreenshot/blob/master/pyscreenshot/plugins/mac_quartz.py

Would you be so kind to write simple test case using pyscreenshot.grab() and report the issue there? (i.e., I assume that pyscreenshot.grab((1,1,2,2)) would basically take screenshot of your whole screen (assuming that topleft corner is (0,0) and bottom right corner is (x_max, y_max)), so just running python, grabbing screen and saving it into file)

bknowles commented 7 years ago

Doing a pyscreenshot.grab((1,1,2,2)) only resulted in a PNG file that is 4x4 pixels in size:

test-pyscreenshot

EDIT: Since the PNG is white and doesn't show up on a white background, I'll include the filename here: https://cloud.githubusercontent.com/assets/1151895/24378559/fe6b8cb4-1308-11e7-88cf-1e6cef36e8ec.png

bknowles commented 7 years ago

Okay, I hacked your main() to try some stuff. I found that this works:

def main():
    args = parse_args()
    if args.initialize:
        initialize()
    else:
        config_exists()
        setting_dict = read_setting()
        buttons = Buttons(setting_dict)

    im = screen_grab((1025,876,28,21))
    file_name = os.path.join("training_examples",
                             "newtest-pyscreenshot.png")
    im.save(file_name, "PNG")

Note that the second pair of numbers being passed to screen_grab() are the difference between the respective X and Y positions of the top_left and bottom_right. Here's the PNG that was captured:

newtest-pyscreenshot

The filename is https://cloud.githubusercontent.com/assets/1151895/24378808/f031bdb6-1309-11e7-80fd-3418ec54210b.png

EDIT: note that the PNG is actually twice the size requested -- I think that's because I'm using a Retina screen which has an actual resolution of something like 150dpi, instead of the normal 72dpi.

But macOS compensates for that and handles the translation both ways. It grabs what we actually want, we just have to deal with the fact that the file is larger than we would normally expect.

J-Moravec commented 7 years ago

So in Mac it is additive, not absolute. I will report this to pyscreenshot

bknowles commented 7 years ago

Okay, I made a minor change to temporarily handle the relative size issue, and now I've got it capturing a number of reasonable images. Here's the diff:

$ git diff
diff --git a/BGNRRG.py b/BGNRRG.py
index c110918..64a4338 100644
--- a/BGNRRG.py
+++ b/BGNRRG.py
@@ -1,5 +1,6 @@
 import pyscreenshot
 from pymouse import PyMouse
+from operator import sub
 import pytesseract
 import os
 import argparse as arg
@@ -60,8 +61,8 @@ def create_setting():
     add_to_dict(mouse, setting_dict, "reroll")
     add_to_dict(mouse, setting_dict, "recall")
     add_to_dict(mouse, setting_dict, "store")
-    top_left = "top left corner"
-    bottom_right = "bottom right corner"
+    top_left = "top_left_corner"
+    bottom_right = "bottom_right_corner"
     add_to_dict(
         mouse, setting_dict, top_left,
         message="Hover over top left corner of total roll number:"
@@ -70,6 +71,8 @@ def create_setting():
         mouse, setting_dict, bottom_right,
         message="Hover over bottom right corner of total roll number:"
         )
+
+    setting_dict[bottom_right] = map(sub, setting_dict[bottom_right], setting_dict[top_left])

     setting_dict["total_roll"] = \
         setting_dict[top_left] + setting_dict[bottom_right]

Here's the config.txt:

$ cat config.txt 
recall 1091 927
reroll 1209 923
total_roll 1149 877 35 22
store 971 927

And here are the files captured by the command python BGNRRG.py --training --no_value -n 10:

0 1 2 3 4 5 6 7 8 9

bknowles commented 7 years ago

I'm guessing now that I need to run the retraining process?

EDIT: Confirmed. I did a training run with 100 grabs, and it is mis-recognizing 79 as 737. I'm not sure about the other numbers.

EDIT2: More accurately, it is mis-recognizing 9 as 37. It's also mis-recognizing 5 as 3.

So, how do I feed this information back to tesseract in order to get corrected OCR information?

bknowles commented 7 years ago

When I do a run in non-training mode, it doesn't seem to actually be clicking on the buttons, although it is recognizing some sort of value being returned by tesseract. So, when I do a run with a current total of 79 showing, it gives me 100 loops of claiming to see 737, and then quits.

J-Moravec commented 7 years ago

Please, read the help in "training", I should probably change the name, its not really "training", but it shows what kind of values tesseract it is reading. I will try to write something about it on weekend. In the meantime, would you please be so kind and help us debug underlying libraries? As devs not I have access to mac. See this open issue: https://github.com/ponty/pyscreenshot/issues/40

I could easily write fix, but I don't have access to mac to test it:/

Thanks!

bknowles commented 7 years ago

Okay, that's doubly weird. I had un-commented the lines to save the maximum value PNGs, and that appears to have been what was causing the clicks on the reroll button to fail. Comment those lines back out, and now it can actually click on the reroll button.

It's still mis-recognizing the 9s and 5s, but we're really, really close!

J-Moravec commented 7 years ago

Because it is OCR, it there will be still some failure rate. We would have to make another training dataset and create bgee3.traineddata with bigger weight on 9 and 5. And again and again, until we have low failure rate. I need to think about automating this process.

bknowles commented 7 years ago

Okay, so reading the documentation at https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality, it seems that there are some things we can do to the input images to help tesseract achieve better quality OCR.

I also found some useful examples at https://stackoverflow.com/questions/9480013/image-processing-to-improve-tesseract-ocr-accuracy.

In this specific case, I believe that the most useful things for us to do would be to convert the image from color to grayscale, and then to maximize the contrast and remove the background noise in the image. Doing some searching around, it seems that OpenCV could be a good choice for these functions.

Specifically, the example for converting to grayscale is at http://docs.opencv.org/master/df/d9d/tutorial_py_colorspaces.html, and the image thresholding example is at http://docs.opencv.org/master/d7/d4d/tutorial_py_thresholding.html.

The thresholding example with cv2.THRESH_BINARY_INV would appear to do two steps at once, and give us the cleanest black text on a white background for input to tesseract.

However, what I have not yet worked out is how to convert the image type returned by screen_grab(buttons.total_roll) into a type INPUT_ARRAY, which is what the OpenCV routines seem to want. Obviously, once you've done the conversions shown above, you would then have to translate that back before feeding it to tesseract.

ButteredGroove commented 6 years ago

Alas... I was hoping this would end in solution. Ah, well.