beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

`java.lang.OutOfMemoryError: Java heap space` error when running docker #93

Closed ichbinhandsome closed 2 years ago

ichbinhandsome commented 2 years ago

Hi, I'm using evaluate_anserini_bm25.py to reproduce BM25 results on quora evaluation. When I ran the script, the anserini server in docker gave me this error

INFO  [Thread-0] search.SimpleSearcher (SimpleSearcher.java:464) - 81.33 percent completed
Exception in thread "pool-2-thread-7" Exception in thread "pool-2-thread-6" java.lang.OutOfMemoryError: Java heap space
INFO:     172.17.0.1:57790 - "POST /lexical/batch_search/ HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/h11_impl.py", line 369, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 59, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/fastapi/applications.py", line 201, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/usr/local/lib/python3.6/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.6/site-packages/fastapi/routing.py", line 202, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/usr/local/lib/python3.6/site-packages/fastapi/routing.py", line 150, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/usr/local/lib/python3.6/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "./main.py", line 61, in batch_search
    hits = searcher.batch_search(queries=queries, qids=qids, k=k, threads=threads, fields=fields)
  File "/usr/local/lib/python3.6/site-packages/pyserini/search/_searcher.py", line 199, in batch_search
    results = self.object.batchSearchFields(query_strings, qid_strings, int(k), int(threads), jfields)
  File "jnius/jnius_export_class.pxi", line 1145, in jnius.JavaMultipleMethod.__call__
  File "jnius/jnius_export_class.pxi", line 857, in jnius.JavaMethod.__call__
  File "jnius/jnius_export_class.pxi", line 937, in jnius.JavaMethod.call_method
  File "jnius/jnius_jvm_dlopen.pxi", line 91, in jnius.create_jnienv
jnius.JavaException: JVM exception occurred: Java heap space java.lang.OutOfMemoryError
        at org.apache.lucene.util.packed.PackedReaderIterator.<init>(PackedReaderIterator.java:45)
        at org.apache.lucene.util.packed.PackedInts.getReaderIteratorNoHeader(PackedInts.java:847)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.doReset(CompressingStoredFieldsReader.java:447)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader$BlockState.reset(CompressingStoredFieldsReader.java:389)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.document(CompressingStoredFieldsReader.java:568)
        at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.visitDocument(CompressingStoredFieldsReader.java:578)
        at org.apache.lucene.index.CodecReader.document(CodecReader.java:84)
        at org.apache.lucene.index.BaseCompositeReader.document(BaseCompositeReader.java:118)
        at org.apache.lucene.index.IndexReader.document(IndexReader.java:349)
        at org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:289)
        at io.anserini.rerank.ScoredDocuments.fromTopDocs(ScoredDocuments.java:65)
        at io.anserini.search.SimpleSearcher.search(SimpleSearcher.java:553)
        at io.anserini.search.SimpleSearcher.searchFields(SimpleSearcher.java:605)
        at io.anserini.search.SimpleSearcher.lambda$batchSearchFields$0(SimpleSearcher.java:443)
        at io.anserini.search.SimpleSearcher$$Lambda$94/0x000000084013b440.run(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

I have already allocated this docker image 16GB memory and very high disk space.

Thanks for your help.

ichbinhandsome commented 2 years ago

I solved this issue by using docker run -p 8000:8000 -e JAVA_TOOL_OPTIONS="-Xms1024m -Xmx8g" -it beir/pyserini-fastapi