gamemodel never starts (GameModel.run() isn't entered) sometimes

GoogleCodeExporter commented 9 years ago

I was doing some testing on FICS by following a player who was playing lots of 
lightning matches and after about 4 different game tabs with respective 
analyzer instances (stockfish) had been opened (creating alot of CPU overhead 
and apparently altering timings enough to manifest this bug), and upon the 
player starting a new match, the gamewidget was created and attached and the 
board was drawn, but the pieces weren't drawn and the game didn't start.

I think this is caused by a race condition present in ThreadPool.py. My 
reasoning is based on the attached log, in which ionest.workfunc finishes 
according to the log, and therefore, GameModel.loadAndStart must have completed 
and GameModel.start() must have been called. Also, the next thread dump after 
the game is unsuccessfully started shows this thread which doesn't show up 
previously:

Thread: ICGameModel.run:537 (140291273123584)
  File "/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
    self.__bootstrap_inner()
  File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
    self.run()
  File "/home/gatto/code/hg/pychess/lib/pychess/System/ThreadPool.py", line 103, in run
    self.wcond.wait()    # wait for work
  File "/usr/lib64/python2.7/threading.py", line 339, in wait
    waiter.acquire()

So it appears that GameModel.run is called and the thread is initialized (via 
'a.func = ...') and named in ThreadPool.start, yet the following first line of 
GameModel.run is never executed:
        log.debug("GameModel.run: Starting. self=%s" % self)

The only explanation I can think of is a race condition where, after the 
ThreadPool.Worker finishes running its self.func and puts itself back into the 
thread pool via:

                        self.worker = None
                        self.queue.put(self)

... the code in ThreadPool.start below:

        a.worker = lambda: worker(*args, **kw)
        a.name = worker.name

        a.wcond.acquire()
        a.wcond.notify()
        a.wcond.release()

... runs before the code in ThreadPool.Worker.run below does:

                    self.wcond.acquire()
                    self.wcond.wait()    # wait for work
                    self.wcond.release()

... Which would cause the Worker to sit endlessly waiting for a wcond.notify 
that never will occur.

Original issue reported on code.google.com by mattgatto on 11 Jun 2014 at 1:13

Attachments:

2014-06-07_14-31-55-truncated.log.bz2

GoogleCodeExporter commented 9 years ago

Original comment by mattgatto on 11 Jun 2014 at 1:16

Added labels: Priority-Critical

GoogleCodeExporter commented 9 years ago

I attached a patch that resolves this by getting rid of the ThreadPool 
altogether. Does anybody see a reason to keep it? We don't have anything that 
requires a thread pool, like a server or heavy computations.

Original comment by mattgatto on 12 Jun 2014 at 9:46

Added labels: Milestone-Anderssen0.12

Attachments:

patch-1000.diff

GoogleCodeExporter commented 9 years ago

I know no reason to keep ThreadPool at all.

Just one little note:
in GtkWorker.py line #83
        Thread.__init__(self, fident(func))
should be
        Thread.__init__(self, name=fident(func))

Original comment by gbtami on 13 Jun 2014 at 11:41

GoogleCodeExporter commented 9 years ago

This issue was closed by revision 3c86505c38fc.

Original comment by mattgatto on 20 Jun 2014 at 4:15

Changed state: Fixed

fowode / pychess

gamemodel never starts (GameModel.run() isn't entered) sometimes #883