I tried to run DrQA modifying a bit by specifying myself the Wikipedia document I want to use in drqa/pipeline/drqa.py
doc_texts = wikipedia.page(queries).content
I did it because I wasn't able to run the document Retriever without having a memory error. Yet, when arriving to the document Reader code it seems that it creates several ForkPoolWorker and in my case they exceed the Timeout as you can see there:
>>> process('what is the population of Toulon?')
07/20/2018 05:42:19 PM: [ Processing 1 queries... ]
07/20/2018 05:42:19 PM: [ Retrieving relevent docs... ]
There is one question, no ranking available yet
after spliting and flattening
Process ForkPoolWorker-2:
Process ForkPoolWorker-4:
Process ForkPoolWorker-3:
Process ForkPoolWorker-1:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 99, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 462, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.
pexpect.exceptions.TIMEOUT: Timeout exceeded.
pexpect.exceptions.TIMEOUT: Timeout exceeded.
pexpect.exceptions.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
During handling of the above exception, another exception occurred:
During handling of the above exception, another exception occurred:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/pipeline/drqa.py", line 39, in init
PROCESS_TOK = tokenizer_class(**tokenizer_opts)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/pipeline/drqa.py", line 39, in init
PROCESS_TOK = tokenizer_class(**tokenizer_opts)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/pipeline/drqa.py", line 39, in init
PROCESS_TOK = tokenizer_class(**tokenizer_opts)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/pipeline/drqa.py", line 39, in init
PROCESS_TOK = tokenizer_class(**tokenizer_opts)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
self._launch()
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
self._launch()
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
self._launch()
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
self._launch()
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/home/mike/Programming/WikiQueri/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
return exp.expect_loop(timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
return exp.expect_loop(timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
return exp.expect_loop(timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/spawnbase.py", line 390, in expect_exact
return exp.expect_loop(timeout)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 107, in expect_loop
return self.timeout(e)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
File "/home/mike/.local/lib/python3.6/site-packages/pexpect/expect.py", line 70, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f2924884ba8>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
before (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 17924
child_fd: 9
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f2924884c50>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
before (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 17923
child_fd: 10
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f2924884cf8>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
before (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 17925
child_fd: 11
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f2924884da0>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
before (last 100 chars): b'Programming/WikiQueri/DrQA\x07\x1b[01;32mmike@mike-thinks\x1b[00m:\x1b[01;34m~/Programming/WikiQueri/DrQA\x1b[00m$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 17926
child_fd: 12
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"
The code involved is the following
# Push all examples through the document reader.
# We decode argmax start/end indices asychronously on CPU.
result_handles = []
num_loaders = min(self.max_loaders, math.floor(len(examples) / 1e3))
for batch in self._get_loader(examples, num_loaders):
if candidates or self.fixed_candidates:
batch_cands = []
for ex_id in batch[-1]:
batch_cands.append({
'input': s_tokens[ex_id[2]],
'cands': candidates[ex_id[0]] if candidates else None
})
handle = self.reader.predict(
batch, batch_cands, async_pool=self.processes
)
else:
handle = self.reader.predict(batch, async_pool=self.processes)
result_handles.append((handle, batch[-1], batch[0].size(0)))
I tried to run DrQA modifying a bit by specifying myself the Wikipedia document I want to use in
drqa/pipeline/drqa.py
I did it because I wasn't able to run the document Retriever without having a memory error. Yet, when arriving to the document Reader code it seems that it creates several ForkPoolWorker and in my case they exceed the Timeout as you can see there:
The code involved is the following
Is it again a memory related error?