chriskiehl / Gooey

Turn (almost) any Python command line program into a full GUI application with one line
MIT License
20.66k stars 1.02k forks source link

TextCtrl can become too large #351

Open nitaro opened 6 years ago

nitaro commented 6 years ago

Hi,

Using Gooey (v 1.0.2 on Win10, Python 3.6) for scripts that have large amounts of print/logging statements can crash an application if the text box that shows stdout becomes too large.

It would be nice to be able to pass a function to the Gooey() decorator that is executed inside the appendText method from Gooey-master\gooey\gui\components\console.py before the the text box is updated.

Example:

def my_filter(_self, msg):
    if msg.startswith("secret:"):
        msg = ""
    if len(_self.textbox.GetValue().split()) > 100:
        _self.textbox.Clear()

@Gooey(textctrl_plugin=my_filter)
def main():
    # my code here ...
chriskiehl commented 6 years ago

Howdy @nitaro

Could you please provide a minimal example that demonstrates the problem. It makes debugging a lot easier when I don't have to spend time writing a harness myself.

Thanks!

nitaro commented 6 years ago

Sure thing. Sorry for the omission. Here's a demo that eventually crashes for me (around 105k lines):

#!/usr/bin/python 3

from gooey import Gooey, GooeyParser, gui

@Gooey()
def main():

    parser = GooeyParser()
    parser.add_argument(
        "some_string",
        action="store",
        default="just press Start",
    )

    args = parser.parse_args()
    print(args, flush=True)

    # print too many messages (causes crash for me around 105k).
    for i in range(999999):
        print(i, flush=True)

if __name__ ==  "__main__":
    main()

Thanks!

chriskiehl commented 5 years ago

Alright, spent a few hours looking into this. Dumping my thoughts here for future me. There's a hack-y work around which can be used, so I'm going to put this off till after 1.0.3 goes out.

Problem

The root of this problem is that the WX queue just gets overwhelmed with updates when there is a tight loop like in your example. It eventually leads to unresponsiveness. The volume of data doesn't appear to be an issue at all (until you get into real extremes). You can put 1,000,000 rows into a TextCtrl with zero loss of UI Responsiveness. Wx happily slurps it up!

image

Further demonstrating the queue flooding problem is that if you throttle your output juuust a wee bit, the issue disappears entirely.

@Gooey()
def main():
    ...

    # print too many messages (causes crash for me around 105k).
    for i in range(999999):
        import time 
        time.sleep(0.001)  # <-- note the delay added 
        print(i, flush=True)

It'll chug along (albeit, slowly) without issue.

image

WX can keep up with the updates as it reads them from stdout.

Non-Solutions

Naively buffering the output.

processor.py can be modified to read stdout in buffered chunks thus breaking the output into a series of coarse updates. However, this approach fails on two counts:

  1. Staggering the output like this is very visually unsatisfying
  2. all WxWidget updates eventually happen on the main thread, which means that the UI still becomes unresponsive while the TextCtrl tries to append this now large, buffered chunk of text.
  3. It takes a lot of finagling to find a 'sweet spot' where the output is buffered enough to maintain some UI responsiveness, but not so buffered that it becomes a series of broken updates. There is no Sane User Default as it's output dependent.

Planned Solutions:

Add some smarts to the hand off point between processor.py and Wx. It currently passes along its updates to WX as fast as it receives them, which is more or less the core of the problem. It'll instead have to be a little more of a smart buffer/ token bucket / scheduler thing to keep the WX thread mostly open for everything else it needs to do.

That's the plan as of now anyway..

nitaro commented 5 years ago

Hi Chris,

I tested it with the delay you added and if ran far past where it originally crashed my laptop.

In my real world use case, I'm using logging statements while looping through large email accounts for processing rather than printing through a simple range() loop - so I image there was already somewhat of a delay between logging statements when I first noticed the issue with my real data.

So one thing I did notice with this example, i.e. with time.sleep() added, is that it will still hang/crash for me around the 35-40k mark if I begin to switch my main window to another application while the Gooey example is running: that's to say if I toggle between an open application (like my browser) and the Gooey application, the hang up issue remains. Granted, I probably get crashes sooner than most given that my laptop is older.

If I only print every 1k, then I'm able to switch between open applications far past the 40k mark without the Gooey application crashing:

for i in range(999999):
    if i % 1000 == 0:  # <-- added this to reduce print statements.
        print(i, flush=True)
    time.sleep(0.001)  # <-- note the delay added 

From a practical standpoint, I wonder if there's any real utility to having the TextCtrl box exceed a certain length in the first place.

IOW, anyone who needs extensive logging to be stored should log/write to a file, etc. but I don't see an end user scrolling through 500k+ of messages to see what happened prior.

I would think that maintaining a TextCtrl length of only the last n messages might be a worthwhile feature.

chriskiehl commented 5 years ago

I would think that maintaining a TextCtrl length of only the last n messages might be a worthwhile feature.

Yeah, I totally agree and meant to say as much in my little brain dump above. However, my only hesitation on adding it as a fix for your issue, is that I don't believe that the buffer size is the thing that's actually causing the problem. I ran a few more tests and you can pretty much slam data into WX until you run out of memory and WxPython itself die with an exception. The UI Hanging/becoming unresponsive while processing is symptomatic of the UI thread getting locked up. If your email output is large enough, or frequent enough, WX can spend a non-trivial amount of time doing a single AppendText operation, which blocks the main thread, which causes back pressure in the queue, which leads to an unresponsive UI, which leads to...

If we modify the initial example to dump out more realistic output (roughly approximated with about 400 words of Lorem ipsum), the UI goes non-responsive after a very short time even on my desktop i7 machine. Debugging shows the Popen thread still chugging along, but none of the updates making their way to the UI.

email = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
Section 1.10.32 of "de Finibus Bonorum et Malorum", written by Cicero in 45 BC

"Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?"

Section 1.10.33 of "de Finibus Bonorum et Malorum", written by Cicero in 45 BC

"At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti atque corrupti quos dolores et quas molestias excepturi sint occaecati cupiditate non provident, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Et harum quidem rerum facilis est et expedita distinctio. Nam libero tempore, cum soluta nobis est eligendi optio cumque nihil impedit quo minus id quod maxime placeat facere possimus, omnis voluptas assumenda est, omnis dolor repellendus. Temporibus autem quibusdam et aut officiis debitis aut rerum necessitatibus saepe eveniet ut et voluptates repudiandae sint et molestiae non recusandae. Itaque earum rerum hic tenetur a sapiente delectus, ut aut reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat."
1914 translation by H. Rackham

Email Number: [{}]
"""

def email_stream(n):
    return (email.format(i) for i in range(n))

... 

def main():
   for email in email_stream(100000):
       print(email, flush=True)

Even if I introduce something which aggressively clears out the TextCtrl as you want, the issue unfortunately remains, as the buffer size is a red herring.

class DroppingConsole(wx.TextCtrl):

    def __init__(self, *args, **kwargs):
        super(DroppingConsole, self).__init__(*args, **kwargs)

    def AppendText(self, text):
        self.writes += 1
        if self.GetNumberOfLines > 500:  # <-- can be adjusted to any value with no effect
            self.Clear()
        else:
            super(DroppingConsole, self).AppendText(text)

It'll take more time to fix, unfortunately. Which means post 1.0.3 😢