antske / coref_draft

Apache License 2.0
0 stars 2 forks source link

Code is not thread safe #5

Open vanatteveldt opened 7 years ago

vanatteveldt commented 7 years ago

The code uses globals in a number of places, making it unsafe to use in a threaded environment.

Best solution is probably to refactor the globals into instance variables on either a new class or on the naf object, and maybe they can even be moved to KafNafParserPy as they look pretty generic (?).

Simple test program:

import sys
import random
from threading import Thread
from multisieve_coreference import process_coreference
from KafNafParserPy import KafNafParser

def run():
    while True:
        f = random.choice(fns)
        nafin = KafNafParser(open(f))
        nafin = process_coreference(nafin)

fns = sys.argv[1:]
for i in range(10):
    Thread(target=run).start()

Results in:

$ env/bin/python test.py /tmp/test*.naf

Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "test.py", line 12, in run
    nafin = process_coreference(nafin)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 701, in process_coreference
    coref_classes, mentions = resolve_coreference(nafin)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 658, in resolve_coreference
    match_full_name_overlap(mentions, coref_classes)
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 37, in match_full_name_overlap
    mention_string = get_string_from_ids(mention.get_span())
  File "/home/wva/coref_draft/multisieve_coreference/resolve_coreference.py", line 20, in get_string_from_ids
    surface_string += token_string + ' '
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
MPvHarmelen commented 5 years ago

As of https://github.com/MPvHarmelen/coref_draft/commit/546cc1493fffa593500a9928862d17c046eef67e, the only globals left are the ones from constituents.py, which are used only (but extensively) in constituent_info.py and naf_info.py.