Open meibaotai opened 6 years ago
This is strange. Have you modified the code? Also, as per README, DrQA is only tested and supported (by me) on Mac-OS and Linux.
Thanks,the windows may not match the QA.We successed on linux.
I'm seeing the same thing on a brand new Ubuntu 18.04 install:
11/28/2018 04:34:11 AM: [ Initializing pipeline... ]
11/28/2018 04:34:11 AM: [ Initializing document ranker... ]
11/28/2018 04:34:11 AM: [ Loading /data/projects/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Traceback (most recent call last):
File "scripts/pipeline/interactive.py", line 70, in <module>
tokenizer=args.tokenizer
File "/data/projects/DrQA/drqa/pipeline/drqa.py", line 109, in __init__
self.ranker = ranker_class(**ranker_opts)
File "/data/projects/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in __init__
matrix, metadata = utils.load_sparse_csr(tfidf_path)
File "/data/projects/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr
matrix = sp.csr_matrix((loader['data'], loader['indices'],
File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 251, in __getitem__
pickle_kwargs=self.pickle_kwargs)
File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 681, in read_array
array = numpy.ndarray(count, dtype=dtype)
MemoryError
Running with 2 vCPU's and 8GB RAM, nothing else on the server. Is a GPU required? More base RAM?
You need at least around 15+ GB RAM.
I'm seeing the same thing on a brand new Ubuntu 18.04 install:
11/28/2018 04:34:11 AM: [ Initializing pipeline... ] 11/28/2018 04:34:11 AM: [ Initializing document ranker... ] 11/28/2018 04:34:11 AM: [ Loading /data/projects/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] Traceback (most recent call last): File "scripts/pipeline/interactive.py", line 70, in <module> tokenizer=args.tokenizer File "/data/projects/DrQA/drqa/pipeline/drqa.py", line 109, in __init__ self.ranker = ranker_class(**ranker_opts) File "/data/projects/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in __init__ matrix, metadata = utils.load_sparse_csr(tfidf_path) File "/data/projects/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr matrix = sp.csr_matrix((loader['data'], loader['indices'], File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 251, in __getitem__ pickle_kwargs=self.pickle_kwargs) File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 681, in read_array array = numpy.ndarray(count, dtype=dtype) MemoryError
Running with 2 vCPU's and 8GB RAM, nothing else on the server. Is a GPU required? More base RAM?
I have the same problem with you on ArchLinux.
➜ DrQA git:(master) ✗ python scripts/pipeline/interactive.py
12/02/2018 06:38:36 PM: [ Running on CPU only. ]
12/02/2018 06:38:36 PM: [ Initializing pipeline... ]
12/02/2018 06:38:36 PM: [ Initializing document ranker... ]
12/02/2018 06:38:36 PM: [ Loading /home/leoyim/Desktop/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Traceback (most recent call last):
File "scripts/pipeline/interactive.py", line 70, in <module>
tokenizer=args.tokenizer
File "/home/leoyim/Desktop/DrQA/drqa/pipeline/drqa.py", line 109, in __init__
self.ranker = ranker_class(**ranker_opts)
File "/home/leoyim/Desktop/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in __init__
matrix, metadata = utils.load_sparse_csr(tfidf_path)
File "/home/leoyim/Desktop/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr
matrix = sp.csr_matrix((loader['data'], loader['indices'],
File "/usr/lib/python3.7/site-packages/numpy/lib/npyio.py", line 251, in __getitem__
pickle_kwargs=self.pickle_kwargs)
File "/usr/lib/python3.7/site-packages/numpy/lib/format.py", line 681, in read_array
array = numpy.ndarray(count, dtype=dtype)
MemoryError
Have you solved it?@jcsturges
Yea, the code is not very computationally efficient. It requires way too much in memory. One of (or the?) author said you need 15+ gigs of RAM though I haven’t confirmed that works. I ended up building my own lightweight version.
@jcsturges can you pls share that efficient code which you have written .....it will be very helpful to all of us....
@jcsturges can you please share your lightweight share, that would be really helpful
my system is windows10,RAM16GB,but it's not enough,i set the virtual memory is 20GB,but when i process a question,the QA can't work .i copy the run code here.should i add my memory? after the code it will print memoryerror.
f:\drqa\DrQA>python scripts/pipeline/interactive.py 07/30/2018 05:12:57 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:12:57 PM: [ Initializing pipeline... ] 07/30/2018 05:12:57 PM: [ Initializing document ranker... ] 07/30/2018 05:12:57 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:15:00 PM: [ Initializing document reader... ] 07/30/2018 05:15:00 PM: [ Loading model f:/drqa/drqa\data\reader/multitask.mdl ] 07/30/2018 05:15:18 PM: [ Initializing tokenizers and document retrievers... ]
Interactive DrQA
07/30/2018 05:16:12 PM: [ Initializing pipeline... ]07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ]07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ]
07/30/2018 05:16:12 PM: [ Initializing pipeline... ]07/30/2018 05:16:12 PM: [ Initializing pipeline... ]
07/30/2018 05:16:12 PM: [ Initializing document ranker... ]07/30/2018 05:16:12 PM: [ Initializing document ranker... ]
07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
File "", line 1, in
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
_fixup_main_from_path(data['init_main_from_path'])
File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
run_name="__mp_main")
File "D:\Anaconda3\lib\runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "f:\drqa\DrQA\scripts\pipeline\interactive.py", line 70, in
tokenizer=args.tokenizer
File "f:\drqa\drqa\drqa\pipeline\drqa.py", line 109, in init
self.ranker = ranker_class(**ranker_opts)
File "f:\drqa\drqa\drqa\retriever\tfidf_doc_ranker.py", line 37, in init
matrix, metadata = utils.load_sparse_csr(tfidf_path)
File "f:\drqa\drqa\drqa\retriever\utils.py", line 34, in load_sparse_csr
matrix = sp.csr_matrix((loader['data'], loader['indices'],
File "D:\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 235, in getitem__
pickle_kwargs=self.pickle_kwargs)
File "D:\Anaconda3\lib\site-packages\numpy\lib\format.py", line 674, in read_array
array = numpy.ndarray(count, dtype=dtype)
MemoryError