facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions
Other
4.47k stars 899 forks source link

Numpy memory error #30

Closed Deepakchawla closed 6 years ago

Deepakchawla commented 6 years ago

When I am running python scripts/retriever/interactive.py command then it shows me below error. root@ubuntu-2gb-nyc3-01:~/DrQA# python scripts/retriever/interactive.py 08/21/2017 08:13:28 AM: [ Initializing ranker... ] 08/21/2017 08:13:28 AM: [ Loading /root/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] Traceback (most recent call last): File "scripts/retriever/interactive.py", line 27, in ranker = retriever.get_class('tfidf')(tfidf_path=args.model) File "/root/DrQA/drqa/retriever/tfidf_doc_ranker.py", line 37, in init matrix, metadata = utils.load_sparse_csr(tfidf_path) File "/root/DrQA/drqa/retriever/utils.py", line 34, in load_sparse_csr matrix = sp.csr_matrix((loader['data'], loader['indices'], File "/root/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 233, in getitem pickle_kwargs=self.pickle_kwargs) File "/root/anaconda3/lib/python3.6/site-packages/numpy/lib/format.py", line 664, in read_array array = numpy.ndarray(count, dtype=dtype) MemoryError

I am using it without GPU and below is my system information. Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 4 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz Stepping: 1 CPU MHz: 2199.998 BogoMIPS: 4399.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 0-3 Can some one help me to resolve that problem..??

Thank You

ajfisch commented 6 years ago

How much free RAM does your system have? Is it possible your download was interrupted and got corrupted?

Deepakchawla commented 6 years ago

below is free command results: total used free shared buff/cache available Mem: 7484 92 7176 9 215 7158 Swap: 0 0 0

Deepakchawla commented 6 years ago

I set the value of cat /proc/sys/vm/overcommit_memory to 1 using echo 1 > /proc/sys/vm/overcommit_memory and again run interactive.py file and it shows me below message... deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py 08/21/2017 05:49:49 PM: [ Running on CPU only. ] 08/21/2017 05:49:49 PM: [ Initializing pipeline... ] 08/21/2017 05:49:49 PM: [ Initializing document ranker... ] 08/21/2017 05:49:49 PM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] Killed now what should I do...??

ajfisch commented 6 years ago

From your free, it looks like you do not have enough RAM on your machine. You need at least around 15 GB and it looks like you have 8 (if the units you posted are MB).

Deepakchawla commented 6 years ago

Ok I will change it from 8gb to 15gb but when I changed its value from 0 to 1 then it doesn't show me any memory relates error and run smoothly but it shows some killed like message now what the reason behind that killed message..

ajfisch commented 6 years ago

Setting the value from 0 to 1 enabled overcommit, always. In overcommit mode the linux kernel always lets a memory allocation like malloc return true. But then when your program actually uses that memory, you will run out of space, and the kernel OOM Killer will kill the process (hence your Killed message).

On the other hand, If overcommit is not enabled, then the kernel will not let programs allocate more virtual memory than is physically available. malloc will return false and the actual program (in this case numpy) will exit with an error (MemoryError).

Deepakchawla commented 6 years ago

okay got your point but now I changed by RAM size and free -m before running Python file total used free shared buff/cache available Mem: 22099 148 21876 10 74 21708 Swap: 0 0 0 deepakchawla35@deepak-server:~/DrQA$ python scripts/pipeline/interactive.py 08/22/2017 03:17:25 AM: [ Running on CPU only. ] 08/22/2017 03:17:25 AM: [ Initializing pipeline... ] 08/22/2017 03:17:25 AM: [ Initializing document ranker... ] 08/22/2017 03:17:25 AM: [ Loading /home/deepakchawla35/DrQA/data/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz ] 08/22/2017 03:19:24 AM: [ Initializing document reader... ] 08/22/2017 03:19:24 AM: [ Loading model /home/deepakchawla35/DrQA/data/reader/multitask.mdl ] 08/22/2017 03:19:31 AM: [ Initializing tokenizers and document retrievers... ] Traceback (most recent call last): File "scripts/pipeline/interactive.py", line 70, in tokenizer=args.tokenizer File "/home/deepakchawla35/DrQA/drqa/pipeline/drqa.py", line 140, in init initargs=(tok_class, tok_opts, db_class, db_opts, fixed_candidates) File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 119, in Pool context=self.get_context()) File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 168, in init self._repopulate_pool() File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/pool.py", line 233, in _repopulate_pool w.start() File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/process.py", line 105, in start self._popen = self._Popen(self) File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 20, in init self._launch(process_obj) File "/home/deepakchawla35/anaconda3/lib/python3.6/multiprocessing/popen_fork.py", line 67, in _launch self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory and running python file it shows something else RAM size... free -m total used free shared buff/cache available Mem: 22099 148 13961 10 7989 21628 Swap: 0 0 0

ajfisch commented 6 years ago

Do you still have overcommit enabled? You might need that to run with the tokenizers, as it allocates (but doesn't use all) memory for the JVM for each tokenizer process.

You can also see if running with --tokenizer spacy works. Edit: Try --tokenizer regexp first, as you'd need to pip install spacy && python -m spacy download en for the former

Deepakchawla commented 6 years ago

no currently overcommit disabled deepakchawla35@deepak-server:~/DrQA$ cat /proc/sys/vm/overcommit_memory 0 You can also see if running with --tokenizer spacy works. => don't get your point...

ajfisch commented 6 years ago
  1. Try running with overcommit enabled (echo 1 > /proc/sys/vm/overcommit_memory)
  2. If that still errors, try running python scripts/pipeline/interactive.py --tokenizer regexp, it uses a less resource intensive tokenizer (where your machine is failing).
Deepakchawla commented 6 years ago

okay, let me try...

Deepakchawla commented 6 years ago

now it working perfectly... thank you so much but it giving me the wrong prediction for some questions:- **>>> process('when facebook company ipo launched') 08/22/2017 03:49:42 AM: [ Processing 1 queries... ] 08/22/2017 03:49:42 AM: [ Retrieving top 5 docs... ] 08/22/2017 03:49:43 AM: [ Reading 323 paragraphs... ] 08/22/2017 03:49:51 AM: [ Processed 1 queries in 8.7226 (s) ] Top Predictions: +------+--------+-------------------------------------+--------------+-----------+ | Rank | Answer | Doc | Answer Score | Doc Score | +------+--------+-------------------------------------+--------------+-----------+ | 1 | 2009 | Initial public offering of Facebook | 49060 | 248.07 | +------+--------+-------------------------------------+--------------+-----------+

Contexts: [ Doc = Initial public offering of Facebook ] To ensure that early investors would retain control of the company, Facebook in 2009 instituted a dual-class stock structure. After the IPO, Zuckerberg was to retain a 22% ownership share in Facebook and was to own 57% of the voting shares. The document also stated that the company was seeking to raise 5 billion, which would make it one of the largest IPOs in tech history and the biggest in Internet history.**

**>>> process('when facebook company IPO launched') 08/22/2017 03:51:07 AM: [ Processing 1 queries... ] 08/22/2017 03:51:07 AM: [ Retrieving top 5 docs... ] 08/22/2017 03:51:07 AM: [ Reading 323 paragraphs... ] 08/22/2017 03:51:14 AM: [ Processed 1 queries in 6.7024 (s) ] Top Predictions: +------+--------+-------------------------------------+--------------+-----------+ | Rank | Answer | Doc | Answer Score | Doc Score | +------+--------+-------------------------------------+--------------+-----------+ | 1 | 2012 | Initial public offering of Facebook | 4.8931e+05 | 248.07 | +------+--------+-------------------------------------+--------------+-----------+

Contexts: [ Doc = Initial public offering of Facebook ] The social networking company Facebook held its initial public offering (IPO) on Friday, May 18, 2012. The IPO was the biggest in technology and one of the biggest in Internet history, with a peak market capitalization of over $104 billion. Media pundits called it a "cultural touchstone."**

**>>> process('who is father of deep learning') 08/22/2017 03:52:47 AM: [ Processing 1 queries... ] 08/22/2017 03:52:47 AM: [ Retrieving top 5 docs... ] 08/22/2017 03:52:48 AM: [ Reading 479 paragraphs... ] 08/22/2017 03:52:55 AM: [ Processed 1 queries in 7.3674 (s) ] Top Predictions: +------+---------------------+---------------+--------------+-----------+ | Rank | Answer | Doc | Answer Score | Doc Score | +------+---------------------+---------------+--------------+-----------+ | 1 | Juergen Schmidhuber | Deep learning | 3.7192e+08 | 453.99 | +------+---------------------+---------------+--------------+-----------+

Contexts: [ Doc = Deep learning ] Deep learning algorithms transform their inputs through more layers than shallow learning algorithms. At each layer, the signal is transformed by a processing unit, like an artificial neuron, whose parameters are 'learned' through training. A chain of transformations from input to output is a "credit assignment path" (CAP). CAPs describe potentially causal connections between input and output and may vary in length – for a feedforward neural network, the depth of the CAPs (thus of the network) is the number of hidden layers plus one (as the output layer is also parameterized), but for recurrent neural networks, in which a signal may propagate through a layer more than once, the CAP is potentially unlimited in length. There is no universally agreed upon threshold of depth dividing shallow learning from deep learning, but most researchers in the field agree that deep learning has multiple nonlinear layers (CAP > 2) and Juergen Schmidhuber considers CAP > 10 to be very deep learning.**

ajfisch commented 6 years ago

I am glad that it is working.

DrQA is just an AI research project -- of course there is no guarantee that it will answer all questions correctly (or in the case of this model be invariant to spelling, capitalization, or phrasing). In fact from our reported evaluations on several QA datasets, you can expect that DrQA will get most questions wrong (but also a fair amount correct). Hopefully this model can be a baseline for machine reading at scale that someone like you can beat 😉.

Then again, the answers to some of these questions are subjective. Perhaps Juergen wouldn't mind the answer to your question 3...

Deepakchawla commented 6 years ago

okay and are you improving or working on its QA datasets to give more accurate answers.... and one more thing currently it taking so much time on giving the answers I want to do it in max. 3 sec.. what should I have to do to achieve this...??

ajfisch commented 6 years ago

Reading comprehension and open-domain QA is an active area of research, for FAIR and others.

To improve the runtime performance of DrQA you will need a machine with better specs. It also scales better with large batches (faster average time per question).

Deepakchawla commented 6 years ago

Okay so I will try with GPU and try to reduce its execution time... and thanks a lot once again... you help a lot and also contribute to accomplishment my passionate project... :smile:

ajfisch commented 6 years ago

You are very welcome!

Deepakchawla commented 6 years ago

:blush:

augmen commented 6 years ago

Hi i am having the same issue with 8GB RAM and 4CPU cores. Can you help us . (pt) root@ml:~/DrQA# python3 scripts/pipeline/interactive.py --tokenizer regexp Traceback (most recent call last): File "scripts/pipeline/interactive.py", line 16, in <module> from drqa import pipeline ImportError: No module named 'drqa'