facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.48k stars 898 forks source link

when i run DrQA to process a question ,it will reload source and print memoryerror,how to fix it? #168

Open meibaotai opened 6 years ago

meibaotai commented 6 years ago

my system is windows10,RAM16GB,but it's not enough,i set the virtual memory is 20GB,but when i process a question,the QA can't work .i copy the run code here.should i add my memory? after the code it will print memoryerror.

f:\drqa\DrQA>python scripts/pipeline/interactive.py 07/30/2018 05:12:57 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:12:57 PM: [ Initializing pipeline... ] 07/30/2018 05:12:57 PM: [ Initializing document ranker... ] 07/30/2018 05:12:57 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:15:00 PM: [ Initializing document reader... ] 07/30/2018 05:15:00 PM: [ Loading model f:/drqa/drqa\data\reader/multitask.mdl ] 07/30/2018 05:15:18 PM: [ Initializing tokenizers and document retrievers... ]

Interactive DrQA

process(question, candidates=None, top_n=1, n_docs=5) usage()

process('what is github?') 07/30/2018 05:15:56 PM: [ Processing 1 queries... ] 07/30/2018 05:15:56 PM: [ Retrieving top 5 docs... ] 07/30/2018 05:16:10 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:10 PM: [ Initializing pipeline... ] 07/30/2018 05:16:10 PM: [ Initializing document ranker... ] 07/30/2018 05:16:11 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:11 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:11 PM: [ Initializing pipeline... ] 07/30/2018 05:16:11 PM: [ Initializing document ranker... ] 07/30/2018 05:16:11 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:11 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:11 PM: [ Initializing pipeline... ] 07/30/2018 05:16:11 PM: [ Initializing document ranker... ] 07/30/2018 05:16:11 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ]07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ]07/30/2018 05:16:12 PM: [ Initializing pipeline... ]

07/30/2018 05:16:12 PM: [ Initializing pipeline... ]07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]

07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ]07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ]

07/30/2018 05:16:12 PM: [ Initializing pipeline... ]07/30/2018 05:16:12 PM: [ Initializing pipeline... ]

07/30/2018 05:16:12 PM: [ Initializing document ranker... ]07/30/2018 05:16:12 PM: [ Initializing document ranker... ]

07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]

07/30/2018 05:16:12 PM: [ CUDA enabled (GPU -1) ] 07/30/2018 05:16:12 PM: [ Initializing pipeline... ] 07/30/2018 05:16:12 PM: [ Initializing document ranker... ] 07/30/2018 05:16:12 PM: [ Loading f:/drqa/drqa\data\wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]

File "", line 1, in File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data) File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare _fixup_main_from_path(data['init_main_from_path']) File "D:\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path run_name="__mp_main") File "D:\Anaconda3\lib\runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "D:\Anaconda3\lib\runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "D:\Anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "f:\drqa\DrQA\scripts\pipeline\interactive.py", line 70, in tokenizer=args.tokenizer File "f:\drqa\drqa\drqa\pipeline\drqa.py", line 109, in init self.ranker = ranker_class(**ranker_opts) File "f:\drqa\drqa\drqa\retriever\tfidf_doc_ranker.py", line 37, in init matrix, metadata = utils.load_sparse_csr(tfidf_path) File "f:\drqa\drqa\drqa\retriever\utils.py", line 34, in load_sparse_csr matrix = sp.csr_matrix((loader['data'], loader['indices'], File "D:\Anaconda3\lib\site-packages\numpy\lib\npyio.py", line 235, in getitem__ pickle_kwargs=self.pickle_kwargs) File "D:\Anaconda3\lib\site-packages\numpy\lib\format.py", line 674, in read_array array = numpy.ndarray(count, dtype=dtype) MemoryError

ajfisch commented 6 years ago

This is strange. Have you modified the code? Also, as per README, DrQA is only tested and supported (by me) on Mac-OS and Linux.

meibaotai commented 6 years ago

Thanks,the windows may not match the QA.We successed on linux.

jcsturges commented 5 years ago

I'm seeing the same thing on a brand new Ubuntu 18.04 install:

11/28/2018 04:34:11 AM: [ Initializing pipeline... ]
11/28/2018 04:34:11 AM: [ Initializing document ranker... ]
11/28/2018 04:34:11 AM: [ Loading /data/projects/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Traceback (most recent call last):
  File "scripts/pipeline/interactive.py", line 70, in <module>
    tokenizer=args.tokenizer
  File "/data/projects/DrQA/drqa/pipeline/drqa.py", line 109, in __init__
    self.ranker = ranker_class(**ranker_opts)
  File "/data/projects/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in __init__
    matrix, metadata = utils.load_sparse_csr(tfidf_path)
  File "/data/projects/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr
    matrix = sp.csr_matrix((loader['data'], loader['indices'],
  File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 251, in __getitem__
    pickle_kwargs=self.pickle_kwargs)
  File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 681, in read_array
    array = numpy.ndarray(count, dtype=dtype)
MemoryError

Running with 2 vCPU's and 8GB RAM, nothing else on the server. Is a GPU required? More base RAM?

ajfisch commented 5 years ago

You need at least around 15+ GB RAM.

leoyim commented 5 years ago

I'm seeing the same thing on a brand new Ubuntu 18.04 install:

11/28/2018 04:34:11 AM: [ Initializing pipeline... ]
11/28/2018 04:34:11 AM: [ Initializing document ranker... ]
11/28/2018 04:34:11 AM: [ Loading /data/projects/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Traceback (most recent call last):
  File "scripts/pipeline/interactive.py", line 70, in <module>
    tokenizer=args.tokenizer
  File "/data/projects/DrQA/drqa/pipeline/drqa.py", line 109, in __init__
    self.ranker = ranker_class(**ranker_opts)
  File "/data/projects/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in __init__
    matrix, metadata = utils.load_sparse_csr(tfidf_path)
  File "/data/projects/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr
    matrix = sp.csr_matrix((loader['data'], loader['indices'],
  File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 251, in __getitem__
    pickle_kwargs=self.pickle_kwargs)
  File "/home/james/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 681, in read_array
    array = numpy.ndarray(count, dtype=dtype)
MemoryError

Running with 2 vCPU's and 8GB RAM, nothing else on the server. Is a GPU required? More base RAM?

I have the same problem with you on ArchLinux.

➜  DrQA git:(master) ✗ python scripts/pipeline/interactive.py     
12/02/2018 06:38:36 PM: [ Running on CPU only. ]
12/02/2018 06:38:36 PM: [ Initializing pipeline... ]
12/02/2018 06:38:36 PM: [ Initializing document ranker... ]
12/02/2018 06:38:36 PM: [ Loading /home/leoyim/Desktop/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ]
Traceback (most recent call last):
  File "scripts/pipeline/interactive.py", line 70, in <module>
    tokenizer=args.tokenizer
  File "/home/leoyim/Desktop/DrQA/drqa/pipeline/drqa.py", line 109, in __init__
    self.ranker = ranker_class(**ranker_opts)
  File "/home/leoyim/Desktop/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in __init__
    matrix, metadata = utils.load_sparse_csr(tfidf_path)
  File "/home/leoyim/Desktop/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr
    matrix = sp.csr_matrix((loader['data'], loader['indices'],
  File "/usr/lib/python3.7/site-packages/numpy/lib/npyio.py", line 251, in __getitem__
    pickle_kwargs=self.pickle_kwargs)
  File "/usr/lib/python3.7/site-packages/numpy/lib/format.py", line 681, in read_array
    array = numpy.ndarray(count, dtype=dtype)
MemoryError

Have you solved it?@jcsturges

jcsturges commented 5 years ago

Yea, the code is not very computationally efficient. It requires way too much in memory. One of (or the?) author said you need 15+ gigs of RAM though I haven’t confirmed that works. I ended up building my own lightweight version.

nim456 commented 5 years ago

@jcsturges can you pls share that efficient code which you have written .....it will be very helpful to all of us....

krutikabapat commented 4 years ago

@jcsturges can you please share your lightweight share, that would be really helpful