Closed ironflood closed 6 years ago
Does htop
show activity, or is something stuck?
When reaching the step "Pre-tokenizing questions" the python3 CPU usage drops from ~270% to 0.1% CPU. So yes it seems stuck.
Is it possible to share the distant supervised data? I have tried several times but failed to generate the dataset because of exceptions and the process keeps running forever. It would be a big help if you can share the data directly.
What version of CoreNLP are you using? The latest versions appear to load large NER lists, which is causing some errors. Using CoreNLP 3.8.0 (the one specified in the install_corenlp.sh
) works for me.
Yes, I tried with the install_corenlp.sh
and then tried the most recent version, none of them worked. I installed CoreNLP and checked if it works. It worked but whenever I run the distant supervision generate.py
file, it gives the same exceptions as mention here. Isn't it possible to host this distant supervised data somewhere? It would be very convenient for us.
What version of pexpect are you using? 4.2.1?
I am not sure about the version. Here is the log:
Traceback (most recent call last):
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/expect.py", line 96, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 466, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/if5/wua4nw/anaconda3.6/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/if5/wua4nw/anaconda3.6/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/if5/wua4nw/anaconda3.6/lib/python3.6/multiprocessing/pool.py", line 103, in worker
initializer(*initargs)
File "generate.py", line 48, in init
PROCESS_TOK = tokenizer_class(**tokenizer_opts)
File "/net/if5/wua4nw/open_domain_qa/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 33, in __init__
self._launch()
File "/net/if5/wua4nw/open_domain_qa/DrQA/drqa/tokenizers/corenlp_tokenizer.py", line 61, in _launch
self.corenlp.expect_exact('NLP>', searchwindowsize=100)
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/spawnbase.py", line 404, in expect_exact
return exp.expect_loop(timeout)
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/expect.py", line 104, in expect_loop
return self.timeout(e)
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/expect.py", line 68, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: Timeout exceeded.
<pexpect.pty_spawn.spawn object at 0x7f863384c908>
command: /bin/bash
args: ['/bin/bash']
buffer (last 100 chars): b' class edu.stanford.nlp.pipeline.StanfordCoreNLP\r\nwua4nw@nlp:~/open_domain_qa/DrQA/scripts/distant$ '
before (last 100 chars): b' class edu.stanford.nlp.pipeline.StanfordCoreNLP\r\nwua4nw@nlp:~/open_domain_qa/DrQA/scripts/distant$ '
after: <class 'pexpect.exceptions.TIMEOUT'>
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 10807
child_fd: 18
closed: False
timeout: 60
delimiter: <class 'pexpect.exceptions.EOF'>
logfile: None
logfile_read: None
logfile_send: None
maxread: 100000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0
delayafterclose: 0.1
delayafterterminate: 0.1
searcher: searcher_string:
0: "b'NLP>'"
Process ForkPoolWorker-10:
Traceback (most recent call last):
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/expect.py", line 96, in expect_loop
incoming = spawn.read_nonblocking(spawn.maxread, timeout)
File "/if5/wua4nw/anaconda3.6/lib/python3.6/site-packages/pexpect/pty_spawn.py", line 466, in read_nonblocking
raise TIMEOUT('Timeout exceeded.')
pexpect.exceptions.TIMEOUT: Timeout exceeded.
And such messages keep coming one after another.
Please check it by running:
import pexpect
pexpect.__version__
I am generating the files on my own for you, but I would like to try to resolve this error.
Thanks, I checked. It is 4.3.1.
Please downgrade to 4.2.1
by running pip install pexpect==4.2.1
and let me know if the error persists
I tried to install version 4.2.1 but getting this message:
Cannot uninstall 'pexpect'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
I looked for solution in web but couldn't find anything reasonable.
In the meantime, I am hosting a generated dataset at http://people.csail.mit.edu/fisch/assets/data/drqa/distant.tar.gz.
Hello,
I can't manage to generate the datasets for DS, no matter the tokenizer used. When attempting with '--tokenizer spacy' the script never goes beyond the line 197 of generate.py
q_tokens = workers.map(tokenize_text, questions)
When using another tokenizer, like '--tokenizer simple' I get the following errors:
Any idea of what might be happening would be greatly appreciated )